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Abstract 

Motivated by applications in genomics, this paper studies the problem of optimal estimation 
of a quadratic functional of two normal mean vectors, Q{fi,9) = with a particular 

focus on the case where both mean vectors are sparse. We propose optimal estimators of Q{fj,,0) 
for different regimes and establish the minimax rates of convergence over a family of parameter 
spaces. The optimal rates exhibit interesting phase transitions in this family. The simultaneous 
signal detection problem is also considered under the minimax framework. It is shown that the 
proposed estimators for (5(/i, 0) naturally lead to optimal testing procedures. 

Key Words: Detection of sparse simultaneous signals, minimax estimation, quadratic functional, 
simultaneous signals, sparse means. 
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1 Introduction 


The problem of quadratic functional estimation occupies an important position in nonparametric and 
high-dimensional statistical inference. It is of significant interest in its own right, and also has close 
connections to other important problems such as signal detection and construction of confidence balls. 


The focus so far has been on the one-sequence case. Bickel and Ritov (1988) showed that there is an 


interesting phase transition in the density estimation setting where the minimax rate of convergence 
is the usual parametric rate when the density function is sufficiently smooth, and is otherwise slower 
than the parametric rate. 

Under the Gaussian sequence model: 


K — Oi -\- 


f = l,2,..., 


( 1 ) 


ild 


Donoho and Nussbaum 


1990), 


Fan 


(1991), and 


Efromovich and Low 


(1996) 


where Zi ~ X(0,1) 

further developed this theory for estimating Q{9) = over quadratically convex parameter spaces 
such as hyperrectangles or Sobolev balls. The Gaussian sequence model ([^ is equivalent to the 
white noise with drift model and can be used to approximate nonparametric regression and density 


estimation models. Cai and Low (2005, 2006b) considered minimax and adaptive estimation of the 


quadratic functional Q{9) over parameter spaces that are not quadratically convex. It is shown that 
in such a setting optimal quadratic rules are often suboptimal and nonquadratic procedures may 
exhibit different phase transition phenomena than quadratic procedures. The results on estimating 
the quadratic functional Q{9) have important implications on hypothesis testing and construction of 
confidence balls. See, for example, LI] ( 1989), Dumbgen| ( 1998), Lepski and Spokoiny (1999), Ingster 
and Suslina (2003), Baraud (2004), Genovese and Wasserman (2005), and Cai and Low (2005, 2006a) 
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Motivated by contemporary applications in genomics, we consider in the present paper estimation 
of the functional 


1 

n 


2=1 


under the Gaussian two-sequence model, 

Xj = /Xj + fjz', Yi = 0i^azi, 


i = 1,... ,n, 


( 2 ) 


( 3 ) 


where 4) ■ • • ,4) z\,...,Zn A^(0,1) and a is the noise level. The goal is to optimally estimate 

the quadratic functional Q{fi,9) based on the observed data {Xi,Yi)^ * = 1, (Strictly speaking, 

Q{fi,6) is a quartic functional, but we will refer to it as a quadratic functional in the two-sequence 
case, as it is quadratic in /x given 9, and vice versa.) We are particularly interested in the case where 
both mean vectors /j. = {fii, ..., fin) and 9 = {9i,..., 9n) are sparse. 

This estimation problem is motivated by the detection of simultaneous signals in genomics, where 
high-throughput technologies have generated a broad array of large-scale genome-wide datasets ( Schena| , 
iShalon, Davis, and Brown 1995t Lockhart, Brown, Wong, Chee, and Gineras| |2002t |Puig, Caspary,| 
Rigaut, Rutz, Bouveret, Bragado-Nilsson, Wilm, and Seraphin, 2001). As the heterogeneous datasets 


provide distinct - but often complementary - views of biological systems, an integrative approach in 
data analysis is called for to obtain a coherent view of the underlying biology. As an example, it 
is of great interest to connect certain genotypes to specific phenotypic outcomes to infer causal re¬ 
lationship among genetic variation, expression and disease. With regard to this, many genome-wide 
association studies (GWAS) have identified potential disease-associated SNPs, and a natural next step 
is to identify genes whose expression levels are regulated by the disease-associated SNPs. A possibly 
effective integrative approach exploits the potential overlap between SNPs associated with expression 
(expression SNPs) and the SNPs associated with disease (disease SNPs) for improved power in detect¬ 


ing gene-disease associations (He, Fuller, Song, Meng, Zhang, Yang, and Li 2013). Recent findings 


also suggest overlapping SNPs between numerous human traits (Sivakumaran, Agakov, Theodora- 


tou, Prendergast, Zgaga, Manolio, Rudan, McKeigue, Wilson, and Campbell 

2011 

) and disorders 

(Cotsapas, Voight, Rossin, Lage, Neale, Wallace, Abecasis, Barrett, Behrens, Giro, et al. 

2011 

Con- 


sortium 


increased power for discovering genes associated with common biological mechanism, thereby informs 
on overlapping pathophysiological relationship between the disorders. Other examples where detect¬ 
ing simultaneously occurring signals is of interest include the detection of shared DNA copy number 


variation across samples and meta-analysis of multiple linkage studies (Zhang, Siegmund, Ji, and Li 


2010 ). 


In a simplified statistical framework, a problem of particular interest is detecting simultaneous 
signals under the Gaussian two-sequence model (§. Specifically, let fi-k 9 = {fii9\, ..., fin9n) be the 
coordinate-wise product of fi and 9. For the mean vector fi (similarly, 0), we say that there is a signal 
at location f if /Xj 4 0 (similarly, 4 0)- Our goal is to detect the existence of simultaneous signal 
for fi and 9, which corresponds to the presence of location Ps with /Xj0j 4 0- Equivalently, we want to 
distinguish between fi-k9 = 0 and fi-k9 ^ 0. Of particular interest is the setting where the proportion 
of signals is small, and the signal strengths are relatively weak. This is indeed the setting in the 
gene-disease associations context, as only a small number of SNPs are expected to be associated with 
a disease or to regulate gene expression level. Moreover, the association, if exists, is weak. 

As demonstrated in the single Gaussian sequence model setting considered in |Gai and Low (2005), 
the minimax hypothesis testing problem is closely connected to the minimax estimation theory. Our 
interest in detecting the existence of simultaneous signals for the unknown mean vectors /x and 9 
motivates the estimation of the quadratic functional of {fi,9) given in ([^. Note that Q{fi,9) = 0 if 
and only ii fi-k9 = 0. Indeed, the study of estimation of Q{fi, 9) turns out to highlight some important 
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features of the testing problem. We emphasize that the two-sequence estimation and detection problem 
are not straightforward extension of the one-sequence case, and is interesting in its own right. 

Our contribution is two-fold. First, we propose optimal estimators of Q{fj,,0) over a family of 
parameter spaces to be introduced, and establish the minimax rates of convergence. It is shown that 
the optimal rate exhibits interesting phase transitions in this family. Along with the establishment 
of the minimax rates of convergence, we explain the intuition behind the construction of the optimal 
estimators. Second, we study the simultaneous signal detection problem under the minimax frame¬ 
work, and show that the proposed estimators for Q{fi, 9) naturally lead to optimal testing procedures. 
Thus, we bridge the gap between estimation and detection in the two-sequence case. Our formulation 


of the simultaneous signal detection problem also provides an alternative view to that of Zhao, Cai, 


and Li (2014), where the problem is studied under the mixture model framework. 


The rest of the paper is organized as follows: Sectionj^considers estimation of the functional Q(/i, 9) 
and establishes the minimax rates of convergence. An application of the estimators of 9) to the 
simultaneous signal detection problem is given in Section Section complements our theoretical 
study with some simulation results, and we conclude the paper with a discussion in Section Some 
additional results that are not included in the main text are given in Appendix [A} Proofs of some of 
the main results are given in Section with the rest relegated to Appendix [B] for the reason of space. 


2 Optimal Estimation of 

In this section, we consider the estimation of the quadratic functional Q{fi, 9) = ^ 
two sparse normal mean vectors /r = {ni,..., Hn) and 9 = {9i,... ,9n) under the Gaussian two- 
sequence model ([^. An additional constraint is also imposed on the number of coordinates that are 
simultaneously nonzero for both mean vectors. The noise level a in model ([^ is assumed to be known. 
Estimation of the noise level, a, is relatively easy under the sparse sequence model ([^ and will be 
discussed in Section [H 

We begin by introducing some notation that will be used throughout the paper. Given a vector 
9 = (01,... ,9n), we denote by ||0||o = Card({z : 0j ^ 0}) the £o-quasi-norm of 0, ||0||2 = 
its ^ 2 -norm, and ||0||oo = maxi<j<„ |0j| its ^oo-norm. For any real number a and 6, set a A 6 = 
min{a, 6},a y b = max{a,b} and a+ = a V 0. Throughout, the notation x bn means that there 
exists some numerical constants c and C such that c < ^ < C when n is large. By “numerical 
constants” we usually mean constants that might depend on the characteristics of the problem but 
whose specific values are of little interest to us. The precise values of the numerical constants c and 
C may also vary from line to line. 

Adopting an asymptotic framework where the vector size n is the driving variable, we parameterize 
the signal strength, sparsity, and simultaneous sparsity of // and 0 as functions of n. Specifically, we 
consider the family of parameter spaces 

Q{/3,e,b) = {(/i,0) e M"- X M" : ||/i||o < kn, ||//||oo < s„, ||0||o < fen, ||0||oo < Sn, 

\\^^*9\\o<qn}, (4) 


indexed by three parameters /3, e, and b. We have the 

sparsity parameterization 


kn = n^. 


(5) 

the simultaneous sparsity parameterization 



qn = n", 

0<e</9, 

(6) 
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and the signal strength parametrization 


Sn = n 


b G 


(7) 


In other words, n(/3, e, b) is the collection of vector pairs (/r, 9) G x M"", where both [i and 9 have at 
most kn nonzero entries, each entry is bounded in its magnitude by and the number of simultaneous 
nonzero entries for ^ and 9 is at most In principle, j3 can take any value between 0 and 1. We are 
primarily interested in the estimation problem for the range 0 < /3 < as it is well-known that this 
corresponds to the case of rare signals ( ]Donoho and Jiir 2004). Also, even though we parametrize 
the signal strength at the algebraic order = n^, throughout we will make remark on the estimation 
result for of order y/log n, since this is an interesting region in the one-sequence signal detection 


problem (Donoho and Jin 2004). 


Our goal is to derive the minimax rate of convergence for Q(/U, 9) over 0(/3, e, b): 


R*{n,n{/3,e,b)) = inf sup - Q{fi,9))‘^ 

Q (/i,0)en(/3,e,6) 


We will show that R*{n,Q{(3,e,b)) satisfies 


R*{n,n{P,e,b)) x 7n(/3,e,6), 


( 8 ) 


where 'jniP, b) is a function of n indexed by /J, e and b. There are two main tasks in establishing the 
minimax rate of convergence. For each triple (/?, e, b) satisfying 0 < e < /I < ^ and 6 G M, we 

(a) construct an estimator Q* that satisfies 

sup < CjniP,e,b), 


(b) and show that 

R*{n, n{j3, e, b)) > c-fniP, e, 6), 

where C and c are numerical constants that depend only on /?, e, 6, and a. Combining the upper bound 
derived in task (a) and the lower bound derived in task (b) yields the minimax rate of convergence Q. 
In this case, we say that the estimator Q* attains the minimax rate of convergence over the parameter 
space fI(/3, e, b). 

Interestingly, the estimation problem exhibits different phase transitions for the minimax rate 
7n(/3,e, 6) in three regimes: the sparse regime where 0 < e < the moderately dense regime where 
f < e < X’ strongly dense regime where ^ < e < Id. Collectively, we call f < e < /3 the 

dense regime. In the sparse regime, simultaneous signal is sparse in the sense that qn <C while in 
the dense regime, simultaneous signal is dense in the sense that qn This is analogous to the 

terminology used in the one-sequence model, where signal is called sparse if 0 < ,0 < ^ (A:„ <C y/n), and 
dense if ^ < /3 < 1 {kn ^ y/n). The key distinction is that, in the two-sequence case, the sparseness 
or denseness is used to describe the relationship between simultaneous sparsity qn and sparsity kn, as 
opposed to between kn and the vector size n. We also remark that our use of the terminology is not 
superhcial — a detailed analysis of lower bound and upper bound for the estimation problem does 
reveal intimate connection to the corresponding regimes in the one-sequence case. 

Intuitively, when b is very small (i.e., signal is very weak), we are better off estimating Q{p, 9) by 

Qo = 0, (9) 

since any attempt to estimate Q{pL, 9) will incur a greater estimation risk. On the other hand, when 
b is sufficiently large (i.e., signal is strong), it is desirable to estimate Q{p.,9) based on the observed 
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data {Xi,Yi), i = 1,... ,n. With a slight abuse of terminology, we say that the signal is weak if it 
corresponds to the region where Qo is optimal, and we say that the signal is strong otherwise. We 
construct two estimators of Q{^, 9) that respectively attain the minimax rates of convergence over the 


sparse and dense regimes when the signal is sufficiently large in Sections 2.1 and 2.2 


Note that it is possible to generalize our parametrization to the case where /x and 9 have different 
levels of both sparsity and signal strengths. This amounts to estimating Q{fj,,9) over the parameter 
space 


n(a, /3, e, a, b) = {(/x, 9) G M” x 


0 ^ jrii 
\\fJ-*9\\o < Qn}, 


— Iloilo ^ Halloo ^ Snj 


where jn = n.“, kn = n^, Qn = with 0<e<aA(3< and = n“, Sn = with a,b ^ 


( 10 ) 

In this 

section, however, we will focus on the simplest case where jn = kn = and Vn = Sn = n^, since the 
technical analysis is similar to that for the more general case (10) but less tedious. We did derive the 
minimax rate of convergence for the case where jn = kn = but and Sn are allowed to differ. As 
the phase transitions for the minimax rates of convergence in this case are much more sophisticated 
but also are less easily digestible, we opt to defer its presentation to Appendix A.l[ The analysis for 
the general case (10) where no constraint is imposed on either the sparsity or signal strength of fi and 
9 follows similarly, provided that the magnitude of the simultaneous sparsity e is compared to a if 
a >b, and to /3 if 6 > a, for the determination of sparse and dense regimes. 


2.1 Estimation in the Sparse Regime 

We begin with the estimation of Q(/x, 9} = over the parameter space n(/3, e, b) in the sparse 

regime, where Qn is calibrated as in expression ([^ with 0 < e < |. 

To construct an optimal estimator for Q(^,9), we base our intuition on the estimation of the 
quadratic functional Q{9) = - X) ™ the case where we only have one sequence of observations Ti, 
X = 1,..., n, from model ([^. Consider the family of parameter spaces indexed by = n^, 0 < /3 < 1 
and Sn = n^,b ^ M: 

0(/3,6) = {0 G MV ||0||o < kn, ||0||oo < (11) 

That is, 0(/3, b) is the collection of vectors in M"" that has at most kn nonzero entries uniformly 
bounded in magnitude by Sn- It can be shown that for 0 < /3 < the minimax rate of convergence 
for Q{9) over 0(/3,6) satisfies 

R*{n,&{l3,b)) := inf sup EeiQ - Q{9)f -fn{l3,b), (12) 

Q e£e{i3,b) 

where 

f nV+4b-2 if6<0, 

7n(/3,6) = < n2^“2(logn)2 if 0 < 6 < §, (13) 

[ nV2&-2 if 6 > 

Moreover, the minimax rate of convergence when Sn = cr\/dTogn for some d > 0 satisfies 
mf sup Eg{Q — Q{9))'^ X n^^“^(logn)^. 

Q 8:\\d\\o<kn,\\e\\oo<cr^d\ogn 

Thus, the phase transition of AniP, b) from 6 < 0 to 6 > 0 is smooth. The special interest on the signal 
strength along the order of -y/logn has its root in the one-sequence signal detection problem, which 
we will discuss in more details in Section [3l 
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When 0 < /3 < 5 , we have kn y/n. Thus, we anticipate only very few coordinates of 6 to be 
nonzero. If, in addition, 6 < 0, then the signal is both rare and weak, and one can do no better than 
simply estimating Q{9) by Qq = 0. Nonetheless, when 6 > 0, signal is rare but sufficiently strong, and 
the estimator 

1 ” 

Qi = - - cr^Tn)+- 6 »o], where 00 := (14) 

i=l 


which performs coordinate-wise thresholding on with choice of tuning parameter = 21 ogn 
is optimal. Note that each term 0? is estimated independently by {Y^ — cj‘^Tn)+ — 0o, since the 
sparsity pattern is unstructured. The estimator (14) involves a thresholding step, {Y^ — a‘^Tn)+, for 
denoising, and a de-bias step by subtracting 9q from the thresholded term so that we estimate the zero 
coordinates of 0 unbiasedly. This is important because the proportion of zero entries in this case is 
relatively large, and a biased estimator for these coordinates will unnecessarily inflates the estimation 
risk. When = a^/dlogn for some d > 0, we are indifferent in terms of estimation, since both Qo 
and Qi attains the minimax rate of convergence. 

We now return to the estimation of Q{fi,9) in the two-sequence case, where 0 < e < 

0 < /3 < |. In this case, kn ^/n, so the signal of individual sequences is rare. Moreover, the 
simultaneous sparsity qn <C '/k^, implying that knowledge about whether is nonzero does not 
entail much about whether 0j is nonzero (and vice versa). This motivates the estimator 


I and 


^ 1 

Q2 = - - ^^rnh - - ^^rn)+ - 0o], /xo = 00 := Eo{Y;^ - a\n)+ (15) 

1=1 


in the case of sufficiently strong signal, where the threshold level = logn. The construction of Q 2 is 
a straightforward extension of the construction of Qi^ each term /x? 0 ? is estimated independently by 
the product [{Xf — a‘^Tn)+ — ij.o][{Y^ — a‘^Tn)+ — 9o]. Since Qn <C following our previous argument, 
thresholding Xf and Y^ independently seems reasonable. 

We now present a theorem on the upper bound of the mean squared error of Q 2 - 


Theorem 1 (Sparse Regime: Upper Bound). For b > 0, the estimator Q 2 as in (15) with Tn 
satisfies 

sup E^f,^s){Q 2 - Qin, 9)f < C n^^+^'’"2(logn)^ -b 

(At,0)eO(/3,e,b) L 


logn 

(16) 


Straightforward calculation shows that for the estimator Qo = Oj 


sup £^(^, 0 )(Qo - Q(/x,0))^ 


sup 

(li,9)eQ(/3,e,b) 





_ 2e+S.b-2 

— a 1 


2 


(17) 


for 0 < e < /3 < ^ and 6 G M. We now show that the combination of Qo (when b < 0) and Q 2 (when 
b > 0) is optimal, by providing a matching lower bound. 

Theorem 2 (Sparse Regime: Lower Bound). Let 0 < e < ^ and 0 < /3 < Then 

R*{n,n{fi,e,b)) > cjn{fix,b), 


where 




^2e+86—2 
re2'^+4b-2^|Qg^^2 


i/ 6 < 0, 
i /0 < 5 < f, 
ifb>^. 


(18) 
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Crucial to the derivation of lower bound is the Constrained Risk Inequality (CRI) given in Brown 


and Low (1996). To apply CRI, it suffices to construct two priors supported on n(/I,e, 6 ) that have 


small chi-square distance but a large difference in the expected values of the resulting quadratic 
functionals. The cases & < | and & > | correspond to choices of distinct pairs of priors. For 6 > |, 
the CRI boils down to the standard technique of inscribing a hardest hyperrectangle, with the Bayes 
risk for a simple prior supported on the hyperrectangle being a lower bound for the minimax risk. 
Nevertheless, the case 6 < § requires the use of a rich collection of hyperrectangles and a mixture prior 
which mixes over the vertices of the hyperrectangles in this collection. Mixing increases the difficulty 
of the Bayes estimation problem and is needed here to attain a sharp lower bound. 

Remark 1. Combining ( [TgI ), ( [l7| ) and ( [I^ , we see that when 0 < e < ^ and 0 < /? < Q 2 attains 
the optimal rate of convergence over II(/3, e, b) when b > 0. In contrast, Qq attains the optimal rate 
of convergence over II(/3, e, b) when b < 0. 

Remark 2. Interestingly, there is no dependence on /3 in the minimax rate of convergence 'yn{l3,e,b) 
in the sparse regime, except for the requirement that 0 < e < 


2.2 Estimation in the Dense Regime 

We now consider estimating Q{fi, 9) in the dense regime, where q-n is calibrated as in expression (© 
with ^ < e < j3. The dense regime is subdivided into two cases: the moderately dense case with 
f < e < X strongly dense case with ^ < e < (3. 


In the dense regime, the estimator Q 2 defined in (15) is suboptimal, as the thresholding step in 


both Xf and ends up thresholding too many coordinates when the signal is weak. Note that the 
simultaneous sparsity qn 3> \/^ suggests that for each coordinate i with //j / 0 , it is usually the case 
that 6i 7 ^ 0, and vice versa. Therefore, it is no longer reasonable to perform thresholding on X? and 
independently. The additional knowledge of relatively high proportion of simultaneous nonzero 
entries suggests that whenever we observe a large value of Xf (an implication of /Xj / 0 ), then even 
if Y^ is small, we should still estimate rather than setting it equals zero. The same reasoning 
applies to the case where Xf is small but Y^ is large. 

To construct an optimal estimator in the dense regime, we again borrow some intuition from the 
estimation of the quadratic functional Q{6) = ^ X] ™ one-sequence case. We consider the family 
of parameter spaces given in but for I < /? < 1. The minimax rate of convergence once again 
satisfies ( 12 ), but with 

.,2/3-1-46-2 

(19) 


n 


lnil3,b) = { n 


n 


-1 

j3+2b-2 


if 6 < 
if^ 


< 6 < 


if 6 > i/. 


When 7 ^ < /? < 1, we have 


2 ^ ^ ^ J-, vvv.. nn ^/n, meaning that many coordinates of 9 is nonzero. The charac¬ 
terization of weak and strong signal is no longer 6 < 0 versus 6 > 0 as in the case of 0 < /3 < but 
b < versus b > That is, given the same signal strength b, the vast number of nonzero 

coordinates of 9 when kn ^ y/n collectively represents stronger signal as compared to the case when 

kn ^ Thus, the threshold of “strong” signal as encoded by b is lowered when kn 3> \/n. It is not 

^_ <20 — 

surprising that for the range of weak signal b < ^ ^ , the estimator Qo = 0 is optimal. On the other 

hand, when b > , the optimal estimator for Q{9) is the unbiased estimator 


1 


Q3 = i:yM-^ 


n 


i=l 


( 20 ) 


An optimal estimator is often one that strikes an appropriate balance between bias and variance in 
its mean squared error. The estimators Qo ^md Q 3 represent two extremities in terms of bias-variance 
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tradeoff. We see that Qq that is optimal for exceedingly weak signal has zero variance, while that is 
optimal for sufficiently strong signal has zero bias. Due to the denseness of nonzero coordinates when 
kn y/n, one could not afford to introduce bias to the estimator in the hope of achieving smaller 
variance. Without additional information about the sparsity structure, the unbiased estimator is 
necessary for optimal estimation of Q{9). 

We now return to the two-sequence setting for the estimation of (5(/i, 0), for the case ^ < e < (3 and 
0 < /3 < 2 - Although the signal for individual sequences is sparse {kn 'C \/n), the simultaneous signal 
is dense in the sense that qn \/k^- The intuition garnered from the one-sequence case motivates 
the following estimator: 

1 ” 

^4 = - E V _ rj] , (21) 

^ • 1 
2=1 

where 

V = E^ 0 , 0 )[iXf - a^)iY^ - a‘^)l{Xf V 

From Q 4 , we see that each term is estimated unbiasedly (modulo rj) by (Xf — a‘^){Y^^ — a^) 
whenever at least one of Xf and Y^ is sufficiently large. This is in accordance with our previous 
argument that estimation should be done whenever we have at least one large value of Xf or Y{^. 
The threshold Tn is a tuning parameter whose value is yet to be determined during the analysis of 
the mean squared error of Q 4 , though it turns out that = clog re for any c > 4 attains the optimal 
rate of convergence. The subtraction of r] from {Xf — a‘^){Y-^ — a‘^)l{X‘f V Y^ > a'^Tn) is needed 
because the majority of coordinates i has /Xj = 0* = 0. A biased estimator for these coordinates 
unavoidably inflates the estimation risk. Nevertheless, due to the rarity of nonzero coordinates in 
individual sequences, the naive unbiased estimator 

= ( 22 ) 


is not optimal, as one would have expected. A thresholding step '^{Xf V Y^ > 
guard against estimating entries with Hi = 9i = 0 with noise. 


O-^Tn) 


is needed to 


Note that <52 dehned in (15) can be written as 


re 


~ > O-Vn) - ^lo\[{Yl - a‘^Tn)l{Y-^ > CjVn) - 9o]. 


2 = 1 


Compare this expression with < 54 , we see that when both Xf and Y^ are large, the term ^f9f is 
roughly estimated as {Xf — a‘^Tn){Y^ — a‘^Tn)- Moreover, {Xf — a‘^Tn){Y^ — a‘^Tn) is a biased estimator 
of ^19^ when Tn> 3. 

We present an upper bound on the mean squared error of Q 4 in the following theorem. 


Theorem 3 (Dense Regime: Upper Bound). For 5 > 0, the estimator Q 4 as in (21) with Tn = 4 log re 
satisfies 


sup £'(^0)((54 — (5(//,0))^ < C'maxire^'^ ^(logre)"^, re'^+®^ ^|. (23) 

{^i,e)£n(p,e,b) ’ ^ 


We now provide a matching lower bound to complement the upper bound in the dense regime. 
Theorem 4 (Dense Regime: Lower Bound). Let ^ < e < j3 and 0 < (3 < Then 


R*{n,n{l3,e,b)) > C7n(/3,e,6), 




where 


7n{l3,e,b) = < 


when f < e < and 


when ^ < e < p. 


7„(/3,e,6) = 


2e+8b-2 

ifb<0, 

.^'^“^(logn)^ 

if0<b<^ 

^-1-46-2 

if ^-^<b< 

e+Gb-2 

ifb>^, 

^2e+86—2 

ifb<0, 

n^'^“^(logu 

)" if^<h<-, 

,^e+66—2 

ifb> 1. 


0-e 


(24) 


6 ’ 


(25) 


The minimax rates of convergence display different phase transitions within two subdivisions of 
the dense regime. In the moderately dense regime where ^ < e < there are phase transitions at 
b = and b = given in (^). Note that if and only if e < ^. In the strongly dense 

regime where e > ^, the phase < b < is non-existent, and we only have one intermediate 
phase 0 < b < |, given in (25). 

We establish the lower bound by constructing least favorable priors and applying CRI. Except for 
the rate which is obtained through the inscription of a hardest hyperrectangle, all other cases 

require some forms of mixing over the vertices of a rich collection of hyperrectangles. 

Remark 3. Combining ( [l7| ), ( [^ , (24), and (25), we see that for the parameter space fl(/3,e,6) with 
^ < e < P < (^4 attains the minimax rate of convergence when 6 > 0. In contrast, Qo = 0 attains 

the minimax rate of convergence when 6 < 0. 


Remark 4. Similar to the sparse regime, there is no dependence on P in the minimax rate of conver¬ 
gence 7n(/3, e, b) in the strongly dense regime, except for the requirement that ^ < e < /3. In contrast, 
7 „(/ 3 , e, b) does depend explicitly on p in the moderately dense regime f < e < ^. 

Remark 5. For either the sparse or dense regime, i.e., 0 < e < P, the minimax rate of convergence of 
6 ) over Q{P, e, b) diverges to infinity when b > Hence when the signal strength is too large, 
it is impossible to consistently estimate Q{ 0 ,, 9 ). 

Remark 6. For either the sparse or dense regime, i.e., 0 < e < /3, when Vn = Sn = (J\/dlogn for some 
d > 0, we have 

inf sup 

||/^||oo<cr-s/d logn,||6»||oo<cr\/d log n 


The minimax rate of estimation is attained hy Qo, Q 2 ior 0 < e < P and by Q 4 for f < e < p. 


2.3 Phase Transitions in the Minimax Rates of Convergence 


We see from Sections 2.1 and 2.2 that within each regime, the minimax rates of convergence exhibit 
several phase transitions. In addition, each transition is governed by a change in the relative mag¬ 
nitudes of the sparsity parameter /3, the simultaneous sparsity parameter e, and the signal strength 
parameter b. In fact, it is the way phase transitions occur within each regime that characterizes the 
regime itself. Furthermore, the phase transitions actually display “continuity” across the boundaries 
of different regimes. 
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To depict what we meant graphically, first note that from Sections 2.1 and 2.2 the minimax rate 
of convergence 

(26) 


modulo a factor involving logn when applicable. In Figure]^ we plot the rate exponent r(/3,e,6) 
against b for the sparse, moderately dense, and strongly dense regimes. 

Specifically, fixing /3 = 0.45, we plot r(/3,e,6) against b for a range of e values in (0,/3). The left 
panel of Figure provides a continuum view of r(/3,e, 6 ), as e increases from 0 to /3. Each piecewise 
straight line corresponds to an e value in the considered range. To highlight the discrepancy among 
the three regimes, we color the sparse regime (0 < e < f) ™ moderately dense regime 

(^ < e < in green, and the strongly dense regime < e < /3) in blue. We see that the three 
regimes have somewhat different behaviors for small positive values of b. In particular, the sparse 
regime and the strongly dense regime experience two transitions (three different slopes), while the 
moderately dense regime experience three transitions (four different slopes). Note that the difference 
in the number of transitions is restored at the intersection of the blue region and the red region. Thus, 
the phase transition is in some sense “continuous” across the regime boundaries — the piecewise 
straight lines corresponding to r(/3,e, 6 )’s exhibit smooth transition as e increases from 0 to (3. The 
right panel of Figure provides a static view for each regime. We plot r(/3, e, b) against b for three 
values of e corresponding to three different regimes: e = 0.12 (sparse regime), e = 0.28 (moderately 
dense regime), and e = 0.4 (strongly dense regime). 


w 



Figure 1: Plot of the rate exponent r(/3, e, b) against the signal strength b. In the sparse regime (^— ), 
r{l3, e, b) changes in the order 2e + 86 — 2, 2e + 46 — 2, e + 66 — 2. In the moderately dense regime (^— ), 
r{l3, e, 6 ) changes in the order 2e + 86 — 2, 2e — 2, /3 + 46 — 2, e + 66 — 2. In the strongly dense regime 
(^— ), r(/3, e, 6 ) changes in the order 2e + 86 — 2, 2e — 2, e + 66 — 2. Left panel: a continuum view of 
r(/3,e,6) as e increases from 0 to /3 = 0.45 (color changes from red to blue). Right panel: a static 
view of each regime: sparse (e = 0.12), moderately dense (e = 0.28), and strongly dense (e = 0.4). 
Transition points are indicated by the knots on the dashed lines. 

Interestingly, in the two-sequence case, the regions {6 : 6 < 0} and {6 : 6 > 0} appears to constitute 
the regions of weak signal and strong signal, respectively, regardless of the level of simultaneous 
sparsity. This is in contrast to the one-sequence case where the dividing line is 6 = 0 when kn <C \/n, 
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and 6 = ^ ^ when ^ We caution that this apparent “reconciliation” in the two-sequence 

case is simply because the signal strengths are taken to be the same for both sequences fi and 6 in the 
simplified results presented above. 

Remark 7. When the signal strengths and Sn = of // and 6 are allowed to differ, it 

turns out that {(a, 6) : a A 6 < 0} characterizes the region of weak signal when <C \/^, while 
{(a, 6) : a V 6 < 0} U {(a, 6) : a A 6 < comprises the region of weak signal when ;:|> ^/k^. We 

refer the readers to Appendix |A.1| for more details. 


3 Detection 


There are strong connections between the problem of estimation and that of testing for quadratic 
functionals in the single Gaussian sequence model setting. In this setting, the primary goal of the 
testing problem is to distinguish between 0 = 0 and 0 7 ^ 0. Therefore, one can view the Gaussian 
sequence model as a signal plus noise model, where 0 = 0 means that there is no signal, and testing 
whether 0 is nonzero amounts to a signal detection problem. It has been shown that test procedures 
based on estimators of the quadratic functional Q{9) can be effective in detecting signals under various 
specifications of the parameter space (see, for example, Cai and Low (2005) and the references therein). 

In this section, we explore the links between estimation and testing for quadratic functionals in 
the Gaussian two-sequence model. In contrast to the one-sequence case, the main interest of testing in 
the two-sequence case is to distinguish between /r*0 = 0 and ^*0/0, where /r*0 = (^i0i,..., Hn^n) 
is the coordinate-wise product of /i and 0. Note that this is in effect a simultaneous signal detection 
problem — we are only interested in the case where both sequences contain signal. As we shall see, 
the estimators Q 2 in (15) and Q 4 ™ (21) can be useful in the construction of test procedures. 

We will consider the hypothesis testing problem under an asymptotic minimax framework, where 
the size n of the mean vector 0 is the driving variable. We first introduce some notation, which is 
applicable to both the one-sequence and two-sequence testing problem, i.e., think of 0 below as a 
generic parameter. Consider the testing problem 


Ho-.ee 0o(n), : 0 G 0i(n), 

where Ooin-) and 0i(n) are parameter spaces whose specification depends on n, and 0o(n)n0i(n) = 0. 
A test Ip is a rule to accept or reject the null hypothesis based on the observed data. Therefore, it 
is a measurable function of the observed data with values in {0,1}. The value ip = 1 means that we 
reject Hq, and the value ip = 0 means that we do not reject Hq. We measure the quality of a test pj 
by the sum of its maximal type I error (over 0o(ra)) and maximal type II error (over 0i(n)): 

5„(0o(n),0i(n))(V’) = sup Egxp + sup Egil-p;). 

9£0o{n) 0e0i(n) 

We define the minimax total error probability for the hypothesis testing problem as the infimum of 
such total error probability over all tests: 


S'n(0o(n), 0i(re)) = inf Sn{&o{n), 0i(n))(V’). 

i/> 

The goal is to establish the asymptotic detection boundary, i.e., the conditions on Qo{n) and 0i(n) 
which separate the undetectable region (where S'„(0o(n), 0i(n)) —1 as n —)• 00 ) from the de¬ 
tectable region (where Sn{Qo{n),Qi{n)) —0 as n —> 00 ). In the interior of undetectable region, 
signal is so weak that no tests can successfully separate E[q from Eli: the sum of maximal type I and 
maximal type II error of any tests tends to one as n tends to infinity. On the contrary, in the interior 


II 











of detectable region, it is possible to find a test that has sum of maximal type I and maximal type II 
error tends to zero as n diverges. Along with the establishment of the asymptotic detection boundary, 
we want to find a test ip* which can perfectly distinguish between Hq and Hi when n is large, i.e., 
Sn{&o{n),Ql{n)){^p*) —)■ 0 as n —oo, in the detectable region. 

We motivate our formulation of the simultaneous signal detection problem by the corresponding 


|Hall and Jin 2010): 


framework that has been established for the one-sequence signal detection problem (Ingster 

hli :0G0 i(/3,6), 


1997 


H 


0 ■ 


= 0 , 


where 


0i(/3,6) = { 0 gMV || 0 | 
kn = with 0 < /3 < 1 and = nP with b G 


0 = kn,9 G { 0 , 


(27) 


(28) 


n“ witn 0 fc K. In this formulation, there is no signal under 
the null hypothesis, whereas signal is constrained in terms of both sparsity and magnitude under the 
alternative hypothesis. Intuitively, for a fixed /3, the signal detection problem becomes easier when b 
increases. Similarly, for a fixed b, an increase in /3 makes the detection problem easier. In fact, the 
detection boundary under this framework is a mathematical formula describing the precise relationship 
between sparsity and signal strength. It is a curve b = p*{l3) that partitions {(/3, 6 ) : 0 < /? < 1, 6 G M} 
into two regions: the detectable region where b > p*{/3), and the undetectable region where b < p*{/3). 

Similar to the problem of estimating the quadratic functional Q{0) over the parameter space 0(/3, b) 
defined in the detection problem (|27[) behaves differently over two regimes. In the dense regime. 


< /3 < 1 , and the detection boundary is 


P*(/3) = 


1 - 2/3 


A simple test based on the quadratic functional Qs defined in (20) can be used here, by letting 

Ip* = 1 (Q 3 > An), where An —)• 0 satisfies limsupn_ < 1. With such choice of An, 

EqiP* + sup 0 g 0 ^(^ 6 ) E 0{1 -Tp*) — 0 whenever b > p*{/3). 

In the sparse regime, 0 < /3 < |. The detection boundary is 

p*iP) = 0 . 

Note that p*{l3) is independent of (3 in this case. The explanation here is that we are not using 
our microscope at the right resolution. A more refined analysis reveals that if we parametrize Sn at 
the order Sn = (Ty/d logn, d > 0, rather than at the algebraic order Sn = re^, 6 G M, then signal is 
detectable whenever d > p*{l3), where 


= 


2(1-v;0)2 

1-2/3 


if 0 < ^ < 1 , 
if 1 < /3 < 1 . 


(29) 


The detection boundary (29) is the same as that given in Donoho and Jin (2004), where a mixture 


model is used. A more general result extending to heteroscedatstic normal mixtures can be found in 


Cai, Jeng, and Jin (2011). The higher criticism test statistic is known to be optimally adaptive for 


the detection problem under the mixture model framework as considered in Donoho and Jin (2004) 


and Cai et al. (2011). Its optimal adaptivity is established in a regression context in Arias-Castro, 


Candes, and Plan (2011), where the single Gaussian sequence signal detection (27) serves as a special 
case, whereas the max-type test statistic is only optimal over the region 0 < /3 (see, e.g.. Theorem 


1 in Arias-Castro et al. (2011)). 


In the interest of connecting testing problem with estimation of quadratic functional, one might 
wonder if some form of sum-of-squares type test statistic such as that used in the dense regime can be 
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effective in the sparse regime. A natural candidate to consider is the estimator Qi of Q{0) defined in 
(14). Indeed, the test ip* = l(Qi > Xn), where A„ = can asymptotically distinguish between Hq 
and Hi \ih > 0, or if = a^Jd logn with d sufficiently large. A rough analysis shows that the test 
procedure works for d > 4. Since we only have an upper bound for the mean and variance of Qi (and 
not the exact values), it will be challenging if not impossible to derive the smallest possible value of 
d where ip* can be effective. 

To generalize detection problem of the form (27), (28) to the two-sequence case, consider the 
following parameter spaces: 

no(/3, a, b) = {(//, 6 ) : ||^||o < K, H^Ho <kn,n€ { 0 , 9 G { 0 , 

\\fj,-k9\\o = 0}, 

0.i{/3, e, a, b) = {(//, 9) : ||^||o < kn, || 6 '||o <kn,fi£ {0, 9 G {0, ±Sn}"', 

y->^0\\o = qn}, (30) 


where kn = ,qn = rf 
want to test 


with 0 < e < /3 < ^, and = n“, Sn = with a, 6 G M. Suppose that we 


Hq \ {fi,9) G Qo{l3,a,b), Hi : {fj.,9) G Qi{l3,€,a,b). 


(31) 


Essentially, we are testing whether = 0 on condition that (//, 9) satisfies the required sparsity and 
signal strength constraints. It is, perhaps, unsurprising that the testing problem being characterized 
by two regimes: the sparse regime where 0 < e < and the dense regime f < e < /?. As the testing 
problem now involves four parameters: a, b, /?, and e, it is easier to describe the detectable versus 
undetectable regions, instead of the detection boundary. 

Theorem 5. We state the detectable and undetectable regions in two cases: 

(a) In the sparse regime, 0 < e < "1. The undetectable region is {(a, 6 ) : aAb < 0}, and the detectable 


region is {{a,b) : a Ab > 0}. The test ip* = 1{Q2 > Xn), where Q 2 is as defined in (15) with 
Tn = logn, Xn = , asymptotically separates Hq from Hi over the detectable region. 

(b) In the dense regime, § < e < /3. The undetectable region fs {(a, 6 ) : a Ab < or a V 6 < 0}, 
and the detectable region is {(a, 6 ) : a Ab > and aV b > 0}. The test ip* = 1 (Q 4 > Xn), 


where Q 4 is as defined in (21) with Tn = 41ogn, Xn = asymptotically separates Hq 


from Hi over the detectable region. 

In Figurej^ we plot the detectable and the undetectable regions for both sparse and dense regimes. 
The detectable region in the dense regime is larger than that in the sparse regime. Interestingly, in 
the dense regime, signal is detectable provided that the signal strength of at least one of the sequences 
is large enough (and the signal strength of the other sequence is not too weak). In contrast, in the 
sparse regime, signal is only detectable when both sequences admit sufficiently strong signals. 


Note that the undetectable region for the detection problem (31) is essentially the same as the 


region where Qo = 0 attains the optimal rate of estimation for Q{pL,9) over the parameter space 
Q{(I,e,a,b) defined in (36) of Appendix |A.1[ To see the connection, compare Figure]^ with Remark]^ 


given at the end of Section [2.3[ and the results in Appendix A.l 


One may notice that the resolution of our microscope is not high enough in some sense: the 
condition a A 6 = 0 when 0 < e < | and the condition aVb = 0 when f < e < /3 does not involve e and 
fi. A more refined analysis may involve the parameterization = a^/cJogn or rnVSn = crV d log n 

for some c, d > 0. One can show that in such cases, the test procedures based on Q2 and Q4 are still 
effective for c, d sufficiently large. A detailed analysis of the exact detection boundary along this order 
is beyond the scope of the paper. 
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Sparse regime 


Dense regime 




Figure 2: Plot of detectable versus undetectable regions for the sparse regime (left) and the dense 
regime (right). The shaded area corresponds to the detectable region, while the unshaded area corre¬ 
sponds to the undetectable region. 


4 Simulation 


In this section, we perform some simulation studies to compare the performance of the three estimators 
Qo = 0) Q 2 as in (15), and Q 4 as in (21), under different scenarios. We compute the mean squared error 
(MSE) of the three estimators and show that our simulation results is compatible with the theoretical 
results given in Section 

So far, we have assumed that the noise level a is known. In practice, a is typically unknown and 
needs to be estimated. Under the sparse setting of the present paper, a is easily estimable. Denote 
by M G with M 2 i-i = Xi and M 2 i = T) for i = 1, ...,n. A simple robust estimator of the noise 
level a is the following median absolute deviation (MAD) estimator: 


median|Mj — median(Mj)| 

^ ~ 0.6745 ■ 

We consider simulation studies over a range of sample size n, sparsity kn = , simultaneous 

sparsity = rf, and signal strength Sn = ■ More specifically, we take n G {10^, lO'^,..., lO”^}, 

(5 = 0.45 for individual sequences, b G {—0.1,0.15,0.2}, and three values of simultaneous sparsity, one 
for each regime: e = 0.02 (sparse regime), e = 0.3 (moderately dense regime) and e = 0.44 (strongly 
dense regime). Figure]^ is the plot of the MSE (averaged over 500 replications) of the three estimators 
over different sample sizes (in the log-log scale), for each combination of simultaneous sparsity and 
signal strength. 

The theoretical results in Sectionindicate that for Q = QqtQ 2 , or 

sup E{Q - 0))'^ X 


for some rate exponent r{/3,€,b) (modulo a logarithmic factor when applicable). Thus, it is not 
surprising that the results in Figure]^ (mostly) exhibit a linear pattern. When the signal is weak with 
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lO _ 


b =-0.1,6 = 0.02 


b = -0.1, 6 = 0.3 


b =-0.1, 6 = 0.44 





Figure 3: Plot of MSE for the estimators Qq ( ), Q 2 (— — ), and Q/^ (■ - ■) over different 

sample sizes n G {10^,..., 10^}, in the log-log scale. The columns are ordered from left to right as 
e = 0.02 (sparse regime), e = 0.3 (moderately dense regime), and e = 0.44 (strongly dense regime). The 
rows are ordered from top to bottom in increasing signal strength: b G {—0.1,0.15,0.2}. Horizontal 
grey line corresponds to MSE = 1, and it serves to distinguish between MSE —)• 0 (negative slope) 
and MSE —)> 00 (positive slope). 


h = —0.1 (see the first row of Eigurej^, we see that Qq (solid red line) and Q 4 (dotted blue line) 
have the lowest mean square error. Note that we expect Qo to be optimal when the signal is weak. 
We observe that Q 4 is nearly as good as Qo from Eigurej^ This is because when the signal is weak, 
the thresholding step l(^f V > cr'^Tn) thresholds both noise and weak signals, and the de-bias 
term rj is extremely small when n is moderately large, resulting in Q 4 ^ Qo = 0 . As signal becomes 
sufficiently strong (6 G {0.15,0.2}), Q 2 starts to dominate in the sparse regime (e = 0.02) while Q 4 
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dominates in the moderately dense and strongly dense regimes (e G {0.3,0.44}). When the signal is 
sufficiently large {b G {0.15,0.2}), Qq is clearly suboptimal. In particular, in the case where signal is 
both dense and strong (5 = 0.2, e G {0.3,0.44}), the MSE of Qq diverges to infinity, as indicated by 
the positive slope of the solid red line. Note also that as either e or b increases, MSE increases, as 
can be seen by the flattening or reversing of slopes towards the right end or bottom of the plot panel. 
This is compatible with the fact that r(/3, e, b) increases with respect to both e and b. 


5 Discussion 

In this paper, we discuss the estimation of the quadratic functional Q(//, ^) = ^ X] over a family 
of parameter spaces where the magnitude and sparsity of both fi and 0 are constrained. We show that 
the minimax rates of convergence display interesting phase transitions over three distinct regimes: the 
sparse regime, the moderately dense regime, and the strongly dense regime. We also demonstrate 
an application of the estimators of the quadratic functional to a closely related simultaneous signal 
detection problem, and show that the resulting test procedures are effective in detecting simultaneous 
signals over the detectable region. Throughout our analysis, we highlight distinctions and similarities 
between the one-sequence and two-sequence problems, for both estimation and testing. 

It will be interesting to generalize our study of the two-sequence estimation problems in numerous 
aspects. In Appendix [A| we show that the optimal rates of estimation for Q{^, 9) continue to subsume 
the aforementioned three regimes, when /r and 9 are allowed unequal signal strengths. Nonetheless, the 
distinction between the sparse and dense regime is more apparent in this setting. In the sparse regime, 
estimation is only desirable when the signal strength of both sequences are sufficiently strong. In 
contrast, in the dense regime, estimation is desirable whenever at least one sequence admits sufficiently 
strong signal (and the signal strength of the other sequence is not too weak). We also examine in 
Appendix the behavior of the estimation problem when signal strength is incorporated through 
^ 2 -norm rather than .^oo-norm. Unlike the one-sequence estimation problem, in the two-sequence 
case the minimax rates of convergence are to some extent degenerate under the ^ 2 -norm constraint. 
Thus, it is reasonable to suspect that the one-sequence and two-sequence estimation problems are 
not that resembling after all. A more refined analysis of the characteristics of both one-sequence 
and two-sequence problems demands an examination of their respective behaviors under the £p-norm 
constraint on the signal strength, for p G (0, oo], and is beyond the scope of the paper. 


6 Proofs of Theorem [l] and Theorem [2] 

In this section, we present the proofs of Theorem and Theorem which concern estimation results 
of Q{p,,9) in the sparse regime. For reason of space, we relegate the proofs of other main results in 
the paper to Appendix [B| 

Henceforth, we omit the subscripts n in kn,qn,Sn and that signifies their dependence on the 
sample size. We denote by the density of a Gaussian distribution with mean p and variance 
and we denote by (.{n, k) the class of all subsets of {1,..., n} of k distinct elements. Finally, c and C 
denote constants that may vary for each occurrence. 


6.1 Proof of Theorem [T] 

The proof of Theorem [^involves a careful analysis of the bias and variance of the estimator Q 2 (defined 
in (Ull))- We need the following lemma from [^i and Lo^ ( |2005[ ) (Lemma 1, page 2939) for proving 
Theorem [TJ 
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Lemma 1. Let Y ^ N{9, and let 6o = Eo{Y‘^ — where the expeetation is taken under 0 = 0. 

Then for r > 1 and 0^ = (y2 _ (7^r)+ — 0o, 

^ ^^l/2er/2’ 

- 0^1 < min{2cjV,02}, 

Var(^2) < 6^202 + 

gT/2 

Lemma is an immediate consequence of Lemma 

Lemma 2. LetY ~ N{9,a‘^) and let9o = Eo{Y‘^ — where the expeetation is taken under 9 = 0. 

Then for r > 1, 


{E(Y^ — — 9qY < + a' 


j4rV2 + 18 


, 100 ^ 


To streamline the presentation of proof of Theorem we defer the proof of Lemma to the end 
of Appendix |B.2[ 


Proof of Theorem\^ We first bound the bias of the estimator Q 2 defined in (15). Using the equality 

AB — ab = {A — a){B — h) + a{B — b) + b{A — a), 
the independence of Xi and Yi, and the triangle inequality, we get 
Ei,,AMXf - aV)+ - ^io][{YY - aV)+ - 0o]} - 

E,^ [(Xf - aV)+ - ^^o] - t4 • Ee^ - 0o] - 9f 


< 


+ hi 


EeA{YY-a\)+-9f\-9t 


+ oi 


E,Y{X^-a\)+-pf\-iY 


< min{2cr^r, fj^} min{2cj^r, 0f} + Yf min{2iT^r, 9f} + 9f min{2cj^r, pf} 

< 2pY min{2cj^r, 9f} + 29f min{2cr^r, p'f}, 

the second inequality follows from Lemma[^ It follows that for (/r, 0) G fl{[3,e,b) and t > 1, 

\E(^,e)iQ2)-Q{h,e)\ 

-71 1 ^ 

- aV)+ - poWY - (T^r)+ - 0o]} - 


1=1 

71 


n 


1=1 


min{2(TV, 0f} + 0f min{2o-V, pj}] 


i=l 


< — min{ 2 cr"^gs^T, 


n 


the second inequality follows from the fact that for (/r, 0 ) G fl(/3, e, 6 ), there are at most q entries that 
are simultaneously nonzero for p and 0 . 

We now proceed to bounding the variance of Q 2 - Applying the equality 


Var(AB) = Var(A)Var(B) + [E{A)YXai{B) + [E{B)YXai{A), 
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for r > 1 , we have 

- t^o][{y^ - (^^t)+ - 0 o]} 

= Var^J(x 2 - ctV)+ - no]Ya.ig^[{Y^ - o-V)+ - 6 »o] 

+ [E^^iXf - aV)+ - /ro]'Var,J(y ,2 _ ^V)+ - ^o] 
+ [Eg^iYi^ - aV)+ - 0 o]'Var^J(x 2 _ ^ 2 ^)^ _ 


< 3 


2 2 , 44r^/2_^18 

6cr + cr 


ot /2 


2^2 , 44r^/2 + 18 


W 2 


+ 10 /i^ 


, 2 fl 2 , At^^ + IS' 


6Adf + A 


r/2 


+ 100 ' 


c_2 2 , _4 4r^/^ + 18 

^ An 


the inequality follows from Lemmaand Lemmaj^ Thus, for (/r, 0) G n(/3,e, 6 ) and r > 1, 


Var^^ 0 (Q 2 ) 
1 




- ^ 0 ]} 


Z =1 

O n r 

s 4 e 

i=l 


f, 2 2 , 44 t1/2_^18 

6ct + cr 


=r/2 


a^2n2 . a^t'/^ + IS 


+ „2 

i=l 


a^2n2 , 4 4t^/2 + 18 


AY 0 ‘ 

2 = 1 


^ 2 2 I ^ 

ba + a 


+ 18' 


3 

< ^ 


36(T'gs' + 12Aks^ 
bAqs^ + cr'A:s' 


20 

H—^ 




/4rV2^18' 

/4ri/2 ^ 18 

V vn 


4 r^/2 18 


Combining the bias and variance term, we get, for r > 1, 


sup L;(^_0)(Q2 - Q(/i,0))' 

(/X,0)er2(/3,e,b) 


c 

< ^ 


c 


min{g^s'r^, g^s®} + max |gs', gs®, ks^ ^ 


/4r^/2 18 


r/2 


fes' 


/4t 1/2_^18\ /4ri/2_^18 


■ 


r/2 


,n 


■ 


r/2 






^.^AEAnAJEAn 


\ e' 


-/2 


r/2 


Suppose that 6 > 0. Then letting r = logn leads to 


sup E^^^g){Q 2 - QA, e)f < C 

(At,0)erj(/3,e,b) 


n^"+4fe-2('jQg^^2 ^e+ 66-2 


< C'7n(/3,e,6), 


where 


ln{P,e,b) 


^2e+4b 2 ^^Qgj ^^2 

^£+ 66-2 


if 0 < 6 < |, 
if 0 >§. 


(32) 
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From the calculation above, one can also check that when 0 < e < /3 and s = a^/d log re, 

sup {log n)"^. 

{lJ.,0)-.\\lJ.\\o<kn,\\9\\o<kn, 

II /^ll oo <ay/dlogn ,l|0||oo <(Ty/d logn 


□ 


6.2 Proof of Theorem [2 

To prove Theorem it suffices to show that for 0 < /? < ^, 


2e+4fe 

if 6 > 0 , 

for 0 < e < |, 

(Case 

1 ) 

2e+8fe-2 

if 6 < 0 , 

for 0 < e < /3, 

(Case 

2 ) 

e+e,b-2 

if 6 > 0 , 

for 0 < e < /3. 

(Case 

3) 


n 

'yn{f3,€,b) > { n 
n 

For individual regions in {(/3,e, 6 ) : 0 < e < ^,0 < /? < ^,6 G M}, the minimax rate of convergence 
is then given by the sharpest rate among all cases in which the region belongs to. For instance, 
the region {{/3,e,b) : 0 < e < < j3 < 5,6 > §} is included in Case 1 and Case 3, hence 

7 „(/ 3 ,e, 6 ) > max{n^'^+^^“^(logn)^, 

To establish the desired lower bounds, for each cases, we construct two priors / and g that have 
small chi-square distance but a large difference in the expected values of the resulting quadratic 
functionals, and then applying the Constrained Risk Inequality (CRI) in Brown and Low (1996). The 
choice of priors / and g are crucial in deriving sharp lower bound for the estimation problem. In 
fact, the fundamental difference between different phases in the sparse regime for the estimation of 
Q{g,6) can be seen from the choice / and g. For some background on lower bound technique, see 


Appendix B.1.1 


Proof of Case 1. Our proof builds on arguments similar to that used in Cai and Low (2004) and 


Baraud (2002), who considered the one-sequence estimation problem. We first follow the lines of the 


in Baraud (2002). Let 


proof of Theorem 7 in Cai and Low (2004), and then apply a result from Aldous (1985) as was done 


/(xi,... ,x„,yi,... ,y„) = '^o{xi)Y\^Po{yi 

i=l i=k+l i=l 


For I G l{k, q), let 


k n k n 

• • • 1 ^n-) yi-) ' ‘ Vn) — n i’o{xi)Wi’ei{yi) n 

2=1 i=k+l 2=1 i=k+l 


where 9i = pl{i ^ I) with p > 0 , and let 


1 



E 

l££ik,q) 


In both / and g, the sequence g = (s,..., s, 0,..., 0) is taken to be the same. However, 6 is taken 
to be all zeros in / but is taken as a mixture in g. The nonzero coordinates of 0 is mixed uniformly 
over the support of at a common magnitude p, whose value is yet to be determined. Our choice of 
/ and g essentially reduces the two-sequence problem to the case where we only have one Gaussian 
mean sequence of length k with q nonzero coordinates, hence explains the correspondence between 
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the sparse regime in the two-sequence case {q < Vk) and the sparse regime in the one-sequence case 
(k < ^/n). 

We now compute the chi-square affinity between / and g, which bears the expression 


\q) • 

For I,Je i{k, q), let m = Card(/ Pi J). Then 


gi9J 


(33) 


9l9J 

f 




2=1 ' 


J4>o{y) dy 

,2 


V’o(yi 

■ k—2q+m ■ 


/ - 2q—2m r n 

i^piy) dy / 


4^p{y) 

4^o{y) 


dy 


= exp 


mp 

(T^ 


It follows that 




exp 


Mp 




, 2 \ 1 


where M has a hypergeometric distribution 


P{M = m) = 


0 


(34) 


As shown in Aldous (1985), M has the same distribution as the conditional expectation E{M\B), 
where M is a Binomial(g, |) random variable and is a suitable cr-algebra. Coupled with Jensen’s 
inequality, this implies that 




exp 


Mf4 




Taking p = (/3 — 2e) log n gives 


q^ 




k 


hence 


/ 


< 1 + 


< e. 


Since Q{p,9) = 0 under / and Q{p,9) = }:qs^p^ under g, it follows from CRI that 


R*{n,Vt{fi,e,b)) > c[ —qs‘^p ) = cm 


2^2\ _ ^„2e-|-46-2 


(logn)^. 


□ 


Proof of Case 2. Let 

n n 

f{xi, ...,Xn,yi,...,yn) = Y[ 4^o{Xi) ifoiyi) 

2=1 2=1 
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For I ^ i{n,q), let 


where Hi = 9i = pt{i G I) with /? > 0 , and let 


2 = 1 


2=1 


91- 

I&l{n,q) 

Contrast the choice of / an (7 here with that used in the proof of Case 1. Rather than fixing p and 
mixing nonzero coordinates of 9 over the support of /i, in this case mixing is done over all n positions 
using nonzero coordinates of p and 9 simultaneously. 

Similar calculation as that used in the proof of Case 1 yields 


Now take p = s = n^. Since 6 < 0, it follows that when n is sufficiently large, 


(35) 


hence 


y < ( 1 + - I <e 


Since Q{p,9) = 0 under /, and Q{p,9) = ^qp^ under g, it follows from CRI that 


R*{n,Q{f5,e,b)) > c[ —qp ) = cn 


— ^.K,2e+86-2 


In fact, when 0 < e < /3 < ^ and s = a^/dTo^ for some d > 0, we also have 

inf sup -E(^, 0 )(Q - Q{H,9)f > (logn)"^. 

Q {lJ;d)-\\lJ-\\o<kn,\\6\\o<kn, 

11/i 11 00 < cr CdTogn, 116» 11 00 < (T Cdlog^ 


This can be shown by letting p = a 


Y^^|^nin^d7(r^-^e)yk)^ in (35). 


□ 


Proof of Case 3. The priors used in this case is very different from that considered in the proofs of 
Case 1 and Case 2 . Let 


q n q n 

/(xi, . . . , Xni 2 / 1 ) • • ■ ) l/n) = 'ipo{xi)Y\'ips{yi) V'o(2/i), 

2 = 1 2 = g+l 2 = 1 2=9+1 

and 

q n q n 

5(xi,...,Xn,yi,...,yn) = Yli^sixi) ++) n '^o{yi), 

2=1 2 = 9+1 2=1 2=9+1 

where 0 < (5 < s. Note that no mixing is performed in this case. Instead, we fix the sequence 
7 / = (s,..., s, 0 ,..., 0 ) in both / and g, and perturb the nonzero entries of 0 by a small amount <5 in g. 
This set of priors provides the sharpest rate for the case when signal is strong, i.e., s = is large. The 
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intuition is that when s is large, estimation of Q{p,,9) is most difficult due to the indistinguishability 
between 9i = s and 9i = s — S, where 5 0. 

The chi-square affinity between / and g is given by 

Let 5 = a!. Then we have 


/ 


= e < oo. 


Since Q{g,,9) = under / and Q{iJ,,9) = ^qs‘^{s — 6)"^ under g, it follows from CRI that 


R*{n, n(/3, e, b)) > c^—qs^[s^ — {s — <5)^)^ 


= c -x/qs (1 + o(l)) = cn' 

' n ' 




(1 + 0 ( 1 )). 


□ 
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A Further Discussions 


In the following, we expand on several of the topics discussed in the paper. In particular, we address 
the estimation of (5(/U, 9) over a more general parameter space than is considered in Q, allowing the 
signal strengths of fi and 6 to differ. We then examine the effect of replacing the £oo-iiorm constraint 
on /r and 6 with the ^ 2 -norm constraint on the estimation problem. 


A.l Estimation of Q{n,9) with Different Signal Strengths 

We consider in Sectionj^the estimation of Q(/U, 9) = ^ the parameter space (10) where 


jn = kn = and Vn = Sn = n^, with 0 < e < /3 < ^ 
estimation result for Q{^, 9) with = kn = but allow r, 
the parameter space 


and 6 G M. In this section, we present the 
and Sn to differ. Specifically, we consider 


n(/3, e, o, b) = {(/r, 9) : ||//||o < fcn, \\^^\\oo < H^Ho < kn, ||6»||oo < Sn, 

\\f^*9\\o<qn}- ( 36 ) 

where kn = Qn = with 0 < e < /3 < ^, and = n“, Sn = with a, 6 G M. 

Similar as before, the estimation problem can be divided into three regimes: the sparse regime 
(0 < e < 1), the moderately dense regime (§ < e < ^), and the strongly dense regime < e < /3). 
When n and 9 have different signal strengths, the minimax rates of convergence for Q(^,9) exhibit 
more elaborate phase transitions, though they still bear the familiar form 

R*{n,n{/3,e,a,b)) sup - Q{fi,9)f 'yn{/3,e,a,b), 


where 7n(/3, e, a, b) is a function of n indexed by (3, e, a, and b. 

For readability, we summarize the corresponding 'yn{l3-,€.,a,b) in Table (sparse regime), Table 
(moderately dense regime), and Table (strongly dense regime), respectively. In all three tables, we 
added cases where = a^Jc logn (as the transition from a < 0 to a > 0 ), and Sn = crV d log n (as 
the transition from 6 < 0 to 6 > 0). Moreover, the minimax rates of convergence are attained by the 
same estimators as before over the respective regimes, as shown by Theorem and Theorem 

Although we do not present the result here due to its lengthiness, estimation of Q{^,9) for the 
case where no constraint is imposed on either sparsity or signal strength of /r and 9 can be analyzed 
analogously provided that the magnitude of the simultaneous sparsity e is compared to a if a > 6 , 
and to /3 if 6 > a, for the characterization of the sparse and dense regimes. 


Theorem 6 (Sparse Regime). Let 0 < e < 


^ and 0 


< /? 


< i 


Then Q 2 defined in (15) with 


Tn = logn attains the minimax rate of convergence over Ll{fi,e,a,b) for {a,b) G {ia,b) : a Ab > 0}. 
On the other hand, Qo defined in (§ attains the minimax rate of convergence over Ll[j5,e,a,b) for 
(a, b) G {(a, b) : a Ab < 0}. 


Theorem 7 (Dense Regime), 


Let f < e < /? and 0 < (3 < h 


Then Q 4 defined in (21) with Tn = 4logn 


attains the minimax rate of convergence over Ll{f3, e, a, b) for {a,b) G {(a, b) : a V 6 > 0 and a Ab > 
On the other hand, Qq defined in (§ attains the minimax rate of convergence over D(/3, e, a, b) 
for (a, 6 ) G {(a, 6 ) : a V 6 < 0 or a Ab < 


Remark 8. Whenever = a\/c logn and Sn = (T\fdTogn for some c,d > 0, the minimax rate of 
convergence is attained by Qq and <52 for 0 < e < [3, and also by Q4 ioi ^ < e < j3. 


The shaded regions in the three tables represent the region where Qq attains the minimax rate of 
convergence. Thus, {(a, 5) : o A 6 < 0} is shaded in Table ^ while {(a, 6 ) : aV 6 < 0 or aA 6 < is 
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shaded in Table and Table Some regions involving = a^/c logn or Sn = o^/d logn are shaded 
as well, as these represent the boundary of estimation, where we are indifferent in terms of estimating 
Q{fi, 0) by Qo or Q 2 in the sparse regime, and by Qo or (^4 in the dense regime. 

Note that the estimation result for the dense regime turns out to be interesting (and more inspiring) 
when rn and Sn can differ. It seems that estimation is desirable whenever the signal strengths of both 
sequences barely exceed some small threshold (a A 6 > but /3 — 2e < 0 in this case) and at 

least one sequence has sufficiently strong signal (a V 6 > 0). This is in contrast to the sparse regime 
where estimation is desirable only when the signal strength of both sequences are sufficiently strong 
(a A 6 > 0). The intuitive explanation is that in the dense regime, knowing that fJ-i ^ 0 (because of 
large X?) most often suggests that 0^/0 too (even if is small), and vice versa, so we cannot afford 
to estimate fifOf by 0 with this additional information. On the contrary, in the sparse regime, knowing 
that Hi ^ 0 does not entail much about whether 0 * 7 ^ 0 due to the sparseness of simultaneously nonzero 
coordinates. Therefore it is better to estimate nff^i by 0 unless both Xf and are large. 

In fact, the minimax rates of convergence for the sparse regime are relatively simple to describe, 
when rn is not necessarily equal to Sn- 

( n26+4a+4b-2 if a A 6 < 0, 

7 „(/ 3 , e, a, 6 ) = < n^'^+^“^^“^(logn)^ if0<aA6<|, 

( n^+4“Vf'+2aAb-2 ifaA5>f. 

Unfortunately, we do not have such easy representation for the minimax rates of convergence in the 
dense regime. Nonetheless, due to the two-dimensional nature of the estimation problem, we find 
tables useful not only in presenting the minimax rates of convergence but also in illustrating the 
regions with weak signals (i.e., the shaded regions). 
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6 < 0 

Sn = o■^/dlogn 

0 < 6 < § 

b>l 

a < 0 

^2e+4a+46—2 

^2e+4:a-'2^Yogn)‘^ 

^2e+4aH-46—2 

^2e+4a+46—2 

rn = cr-y/c log n 

^2e+4fe-2|-jQg^^2 

n^’^“^(logn)^ 

^2e+4b-2|-|Qg^^2 

^2e+46-2^jQg^^2 

0 < a < 1 

^2e+4a+4fe—2 

^2e+4a-2^^Qg^^2 

^2e+4aVfe-2|'jQg j^^2 

j^2e+4fe-2^jQg j^^2 

a > f 

^2e+4a+4fe—2 

n2'^+4“-2(logn)2 

^2e+4a-2QQg ^^2 

^eH-4ft V 6H-2ci Afo — 2 



Table 1; Minimax rates of 

convergence in the 

sparse regime: 0 < e < 



6 < 

< ft < 0 

•Sn = (T^/dlogn 

0 < 6 < ^7^ 

<b< 

4 ^ ^ — 2 

TTTS 

^ j3—2e 
a < 4 

^2e+4a+4b-2 

^2e+4a+4b—2 

n^'^'''4“-2(logn)2 

^2e+4a+4fo—2 

^2e+4a+46-2 

^2e+4a+4b-2 

< a < 0 

^2eH-4a+4b—2 

^2eH-4a.+4b—2 

n^^'’'4“-2(logn)2 

maxln^"*"^^”^, 

^/3+4&—2 

^/3+4fo—2 





^2.+4a-2(iog^): 



?’n = ayjclogn 

^2€+4b-2^Yog n)^ 

j^2e+4b-2QQg j^^2 

n^^“^(log n)^ 

n^’^“^(logn)^ 

^/3+46-2 

^/3+4fo—2 

0 < a < ^7^ 

^2e+4a+4b-2 

max{n^+'‘““^, 

n^^“^(logn)"^ 

n^'^“^(logn)^ 

^/3+46-2 

^/3+4fe-2 



^2e+4fe-2^jQg^^2| 





< a < 

^2e+4a+4b-2 

^/3+4a—2 

^/3+4a—2 

^/3+4a—2 

^/3+4aV6—2 

^/3+4fe-2 


^2€H-4a+4b—2 

^/3+4a—2 

^/3+4a—2 

^/3+4a—2 

^/3+4a—2 

^e-|-4fjV fo-l-2QrA6—2 


Table 2: Minimax rates of convergence in the moderately dense regime: § < e < ^. In this case, we have ^^^ 4 ^ < 



ft< A" 

^7' < ft < 0 

Sn = <7V log n 

0 < ft < 7^ 

< ft < 


a < 

4 

^2e+4a+4fo—2 

^2e+4ct+4fo—2 

^2e+4a-2|-jQg^^2 

^2e+4a+4fe—2 

^2e+4a+4b-2 

^2e+4a4-4&—2 

P-2e , ^ n 

< a < 0 

^2e+4a+4fo—2 

^2e+4a+4fo—2 

^2e+4a-2|-jQg^^2 

maxln^"*"^^”^, 

max{ 

^/3+4b-2 






^2.+4a-2(iog^) 

2} n2^+^“-7logn)7 


Tn = 

= cj\/clogn 

^2e+4b-2j'jQg 

^2e+4b-2j'jQg^^2 

n^^“^(logn)^ 

n^'^“^(logn)^ 

n^'^“^(logn)^ 

^/3+4b-2 

o 

A 

a<^ 

^2e+4a+4fo—2 

max{A+^“-2, 

n^^“^(logn)^ 

n^'^“^(logn)^ 

n^'^“^(logn)^ 

^/3+4&—2 




^2.+4&-2(iog„)2} 




/3-c 

2 

< a < ^7^ 

^2e+4ct+4fo—2 

max{A+^“-2, 

2(iog^)4 

n^'^“^(logn)^ 

max{n^'^“^(logn)^, 

^e+2a+4fe—2 




^2e+4b 2nQg^N2| 


^e+4aVfe+2aA&—21 


a > 

2e-/3 

4 

^2e+4ct+4fo—2 

^/3+4a—2 

^/3+4a—2 

^/3+4a—2 

^e+4a+26—2 

^eH-4c[.V b-l-2(3.Afe— 2 


Table 3: Minimax rates of convergence in the strongly dense regime: ^ < e < /3. In this case, we have 






A.2 Estimation of Q{fi,9) with t' 2 -norm Constraint on fi and 6 

In Section]^ we consider the estimation of (5(/U, 0) over the the parameter space defined in Q. Despite 
our explicit interest in the analysis of parameter space with rare signals, our choice of the £oo-norm in 
encoding signal strength seems somewhat arbitrary. One might ask if the estimation problem exhibits 
different behavior had we chosen a different norm. We have a partial answer for this. Consider the 
following family of parameter spaces where signal strength is expressed in terms of £ 2 -iiorm rather 
than £oo-norm: 

D(a, /3, e, a, h) = {(/x, 6 ) gW" xR'^ : ||/x||o < jn, WfJ-h < Iloilo < K, ll^'lb < Sn, 

||m*6'||o < Qn}, 


where = n“, kn = n^, Qn = n’^,0 < e < a A (3 < ^ and = n“, Sn = d,b G R. Due to the ease 
of presentation of estimation result in this case, we allow different levels of both sparsity and signal 
strength for /x and 6 for greater generality. The reader is free to compare the estimation results in this 
case with that presented in Section A.l, where we also allow and Sn to differ but for the t'oo-norm 
constraint. It turns out that the estimation problem in the case of .^ 2 -iiorm constraint is in some sense 
degenerate and not as meaningful when compared to that with t'oo-norm constraint. We have 


inf sup _ -Q{fj,, 6 )f >i-fn{a,/3,e,d,b), 

Q (^,0)Gn(Q:,/3,e,a,S) 


where 


7n(a,/3,e,o,6) 


^4a+46—2 
^Aa\/b-\-2a/\b—2 


if a A 5 < 0 , 
if a A 5 > 0. 


Note that the minimax rates of convergence here are independent of a, /3, and e. A lower bound 
analysis reveals that the worst case for the estimation problem happens when the signal is concentrated 
at one coordinate, i.e., /x* = f„, 9i = Sn for some i and /Xj = 6j = 0 for all j / i. The estimator Qo = 0 
is optimal when a A 6 < 0, while the estimator Q 2 defined in (15) is optimal when a A 6 > 0. 

On the contrary, for the estimation of Q{9) in the one-sequence case, if we consider the following 
family of parameter spaces 


e{(3,b) = {eGR^:\\e\\o<kn,\\oh<sn}, 


where = n^,0 < /3 < 1 and Sn = ,b G R, then the minimax rates of estimation exhibit similar 
behavior as in the case with £oo-norm constraint. As before, we have 


where 


when 0 < /3 < I and 


mf sup E 0 {Q - Q{9)f X ^niP, b), 
Q 0e0(/3,b) 


ln{l3,b) 


^ if 6 < f, 

< n^^“^(logn)^ if I < 6 </3, 
if 6 > /3, 


ln{l3,b) 


' if b < 

< n~^ ii \ <b 

_ if6>f, 


(37) 


(38) 
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when \ < fi < 1. Letting h = ^ + 6 in (37) and (38) yields the rate given in (13) and (19), respectively. 
The idea here is to link the t' 2 -norm constraint to the ^oo-norm constraint via = kns'^, since then 
we essentially impose the same amount of constraint on the maximal value of Q{8) = ^ 

Nevertheless, in the two-sequence case, the .^ 2 -norm constraint translates to Q{fi, 8) = — 

(equality holds if = fn,8i = Sn for some i and /ij = 8j = 0 for all j i), which is quite 
different from Q{n,8) = < ^Qnrlsl as translated by the t'oo-norm constraint. It does not 

seem sensible to link the two constrains via the relationship and = kns'^, since these are 

specific to the value of Q{^) = ^ X] Q{8) = is not immediately clear how the 

two constraints on the value of Q{fi, 8) should be related. 

The disparity in the behavior of minimax rates of convergence between the one-sequence and two- 
sequence case under the £ 2 -aorm constraint seems to suggest that the intrinsic nature of the problem of 
estimating quadratic functional for the two cases are somewhat different, at least for the specific family 
of parameter spaces that we consider. To better spell out the distinctions and similarities between the 
two cases, a more refined analysis of the estimation problem under £p-norm constraint on the signal 
strength for p G (0, 00 ] seems necessary. We suspect a smooth and monotone relationship between p 
and the degeneracy of the estimation problem in both the one-sequence and the two-sequence cases, 
though the corresponding range of p where transition occurs might be different. In fact, it is easy to 
check that for the one-sequence case, estimation of quadratic functional is degenerate when 0 < p < 1, 
and is meaningful when p > 2. On the contrary, for the two-sequence case, estimation of quadratic 
functional is degenerate when 0 < p < 2. An enumeration of results for all p G (0, 00 ] is beyond the 
scope of the paper. 


B Additional Proofs 


In this section, we present the proofs of Theorem Theorem and Theorem Proofs of lower 
bounds are given in Section B.l , followed by proofs of upper bounds in Section |B.2[ 


B.l Proofs of Lower Bounds 

We first prove Theorem]^ which constitute the lower bound for the estimation rate of Q{p,8) in the 
dense regime. We then prove the minimax lower bound for the hypothesis testing problem considered in 
Theoremj^ to characterize the undetectable region. We begin with some technical tools for establishing 
lower bounds. 


B.1.1 General Tools 

Let P be a set of probability measures on a measurable space {X,A), and let 8 : V —?■ M. For 
Pf^Ps G V, let 8f = 8{Pf), 8g = 8{Pg), and let /, g denote the density of Pf, Pg with respect to some 
dominating measure u. The chi-square affinity between Pf and Pg is defined as 

C = C{Pf,Pg)= I ^jdU. 

In particular, for Gaussian distributions, we have 

C{N{8o,a^),N{8i,a^)) = . 


Throughout, the proof of lower bounds is established by constructing two priors which have small 
chi-square distance but a large difference in the expected values of the resulting quadratic functionals, 
and then applying the Constrained Risk Inequality (CRI) in Brown and Low (1996). Essentially, 


29 












CRI says that if Pf and Pg are such that 0f,9g G 0, the parameter space of estimation, with ^ = 
^(Pf,Pg) < oo, then for any estimator <5 of 0 = 9{P) G 0 based on the random variable X with 
distribution P, we have 

supEe{S{X) - 9f > . 

ee© (1 + 4 / ) 

It follows that to establish lower bound for estimation rate, it suffices to find Pf and Pg such that 
{9g — 9f Y is as large as possible subject to ^(R/, Pg) < 00 . 

Hereafter, we omit the subscripts n in kn,qn,Sn and that signifies their dependence on the 
sample size. We denote by ijjg the density of a Gaussian distribution with mean /i and variance 
and we denote by i{n, k) the class of all subsets of {1,..., n} of fc distinct elements. Also, we remind 
the readers that 

R*(n,H(^,e,6)) = mf sup E{gfi){Q - Q{k^,9))‘^. 

Q (At,0)en(/3,e,b) 

Finally, c and C denote constants that may vary for each occurrence. 


B.1.2 Proof of Theorem [ 4 ] 

To prove Theorem it is sufficient to show that for 0 < /3 < ^, 

for 0 < e < /3, (Case 2 ) 

for 0 < e < /3, (Case 3) 

for ^ < e < /3, (Case 4) 

for 0 < e < /?. (Case 5) 

The proof of Case 2 and Case 3 can be found in Section [ 6 . 2 [ hence we will only provide proofs of Case 
4 and Case 5 below. For individual regions in {(/3,e,6) :f<e</3<^,6G M}, the minimax rate 
of convergence is obtained as the sharpest rate among all cases in which the region belongs to. For 
instance, the region {(/?, e, 6 ) : ^ < e < /3 < 5 , 6 > |} is included in Case 3, Case 4 and Case 5, hence 
7 „(/ 3 ,e, 6 ) > max{n’^+®^“^, n^^“^(logn)'^} = 


'yn{/3,e,b) > < 


n 


2 e+86-2 

e+66-2 


if 6 < 0, 


if 6 > 0, 
^P+Ab-2 if 6 > 0, 

n^^“^(logn)^ if 6 > 0, 


Proof of Case 4- The proof of Case 4 is very similar to the proof of Case 1, besides that a slightly 
different mixture prior g is employed. Let 


k n n 

/(xi, . . . , Xn, yi, . . . , 2 /n) = Wf’sixi) i>o{xi)Wi>o{yi). 

i=l i=k+l i=l 


For I G l{k, q), let 


9l{x\, . . . , Xni J/l) • • • ) Vn) 


k n 

i=l i=k-\-l 


k r 


n 

i=l 




n 

n V’o(yi)i 

i=k-\-l 


where 9i = pl{i G I) with p > 0, and let 


9 



E 

lelik,q) 


Note that in constructing g, mixing is done not only over all possible subsets i{k, q) but also over the 
signs of 9iS. This has largely to do with the intuition that when signal is abundant, uncertainty about 
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the signs of 0j’s further increase the difficulty of the estimation problem. That being said, mixing 
without sign flips (i.e., simply use the priors / and g as given in the proof of Case 1) does not give us 
the tightest lower bound. Similar to Case 1, keeping g = (s,..., s, 0,..., 0) the same in both / and 
g essentially reduces the two-sequence problem to a one-sequence problem. Our choice of priors is 
equivalent to having only one Gaussian mean sequence of length k with q nonzero entries — thus the 
correspondence between the dense regime in the two-sequence case {q > y/k) and the dense regime in 
the one-sequence case {k> y/n). 

Again, the chi-square affinity between / and g has the form (33), where for I,J € ^{k,q) with 
m = Card(/ 0 J), 


gigj 


k „ 

n/ 


2=1 

k 


V’o(yi) 


n f 1 f '^pl{i&I)iyi)'^pl{i&J){yi) '4’-pl{ieI)iyi)'4’-pl{ieJ){y: 

I 4 ^ lhr,('}lA lhr,(')lA 


2=1 ' 


iJoiy-. 


V’o(yi) 


^ j)(i/i) ^ V’—j)(y* 


yr ir f Apivi) f r-pigji) f 
11 4[y My^) J Myr) J 


i^oiyi) 

A-p{yi 


'ipoiyi. 

V'p(yi)V’-p(yi) 


ieinJ 


^o(yi) 


n 1 




n o [+ exp(-/5Vf^^)] 


i&inJ 


= cosh{p'^/a^r 


dyi 


dyi 


It follows that 


y = E[cosh{p'^/a‘^)^], 


where M follows hypergeometric distribution as in (34). Since M coincides in distribution with the 
conditional expectation E{M\B) where M is a Binomial(g, |) random variable and B is a suitable 
cj-algebra (Aldous, 1985), with Jensen’s inequality, we get 

^2 


r 

/ 


< E[cosk{p ^/= ( 1 -|- ^[cosh(/3^/(T^) — 1] 


Since cosh(a:) = -|- e ’^) = 1 ^ J- o(x^) when x ss 0, taking x = (?IcP' with p = yields 

lAAAr- 

Since Q{p,9) = 0 under / and Q{p,9) = ^qs‘^p‘^ under g, it follows from CRI that 


R*{n,Q{/3,€,b)) > c[ —qs^p ) = c# 


2^2\ _^„/3+4fe-2 


□ 


Proof of Case 5. Let / and g be as given in the proof of Case 2 in Section |6.2t and take p = 
— 2e) logn in (35). It follows that when n is sufficiently large. 


^2pCP ^ „l- 2 . ^ 


n 

~P2' 
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hence 


EsI..! 


< e. 


Since Q{^, 6 ) = 0 under /, and Q{fi, 6 ) = ^qp‘^ under g, it follows from CRI that 

2 


1 


R*{n,n{P,e,b)) > c[ —qp ) = ^(logn)^. 


□ 


B.1.3 Proof of Lower Bound in Theorem [5] 

Consider testing between 

Ho:9e 0o(n), Hi : 9 e 0i(n). 

Let / be the density of a prior supported on 0o(n), and let g be the density of a prior supported on 
01 (n). Using the relationship 


inf 

b 


sup Egi'ip) + sup E 0{1 - V’)j > inf[£^/(V’) + Eg{l - i/;)] = 1 - ^ f \g - f\, 
e&0o{n) 0e0i(n) ^ p z j 


and 


i.-/i = /h^/< 


ia- f) 


2 \ 1/2 


p ^ 


< 


- 1 


1/2 


it follows that to show 


inf 

b 


sup Eeiijj) + sup Eg^l-'ijj) 

6G&o(n) 6e0i(n) 


2 

it suffices to establish that f y —)> 1. 

Below, we will use this idea to establish minimax lower bound for characterizing the undetectable 
region in the hypothesis testing problem (31). We divide the proof of lower bound in Theorem into 
two cases: the sparse regime where 0 < e < and the dense regime where ^ < e < (3. 

Proof for sparse regime. Suppose that Q < e < ^ and a Ah < 0. We will show that there exists / and 

2 

g supported on Qo{(3,a,b) and Qi{l3,e,a,b), respectively, such that f y —1. 

Without loss of generality, assume a >b with 6 < 0. Let 

k n n 

f{xi,...,Xn,yi,...,yn) = YiMxi) n 'ipo{xi)Y\'f’o{yi)- 


2=1 


i=k+l 


2=1 


For I G £{k, q), let 

k n k n 

‘ ^ni ?/l; • ■ ■ ; Un) — Wp{Xi) 

where 9i = sl(i G I), and let 


2=1 


i=k+l 


2=1 


i=k-\-l 


9 = HE } . 91- 


0 


E 


I&i{k,q) 


The calculation from the proof of Case 1 in Section 6.2 shows that 
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Since s = n° with 6 < 0, it follows that for n sufficiently large, 

9 




1 . 


□ 


Proof for dense regime. Suppose that ^ < e < fi, and either (1) a A 6 < or (2) a V 5 < 0. 

We will show that in either scenario, there exists / and g supported on idQ{f3,a,b) and Oi(/3, e, o, 6), 

2 

respectively, such that f y —1. 

(1) Suppose that a Ab < Without loss of generality, assume a>b with b < Let 


/(xi, . . . , Xn: Hit ■■■ 1 Un) — YiMxi) n 

i=l i=k-\-l 


2 = 1 


For I G £{k, q), let 


9lixi, • • ■ , Xji, J/l) • • • ) Vn) 

k n ^ r 

= Y[i^r{Xi) ^^ 0 ( 3 :*)]^ 

2=1 2=/c + l 2=1 




n V’o(yi), 

-* 2 =A :+1 


where 6 i = sl{i G I), and let 


^ fk 


/W ^ 

yq) I&l(k,q) 


The calculation from the proof of Case 4 in Section B.1.2 shows that 

,2 / ^ \q 


9 


f 


^ < ( 1 + ^[cosh(s^/cj^) — 1] ) . 


2 /_ 2 \ 


k 


Since cosh(x) = + e ^) = l + ^+ o(x^) when x 0, plug in x = it follows that for 

s = with b < 


f 


< 1 + 


q s 

k 2cr^ 


,4 \ q 


= "+27"' 


4bH-e—/3 


( 2 ) Suppose that a V 6 < 0. Let 

n n 

f{xi,...,Xn,yi,...,yn) = Y\ipo{xi)Y\ifo{yi 

For I G £{n, q), let 


2=1 


2=1 


5f/(xi, ...,Xn,yi,.. .,yn) =Y\f^tMiixi)Y\Myi)^ 
where /r* = rl(z G I), = sl(i G I), and let 


2=1 


2 = 1 


3 /n'j Si- 

yq' I&i{n,q) 
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Similar calculation as in the proof of Case 2 in Section 6.2 shows that 

9 


J f \ n n J 


Since aV b < 0 , for r = n°‘ and s = n^, we have 

.,2 


f \ n n J 


□ 


B.2 Proofs of Upper Bounds 

In this section, we provide proof of Theorem and proof of upper bound in Theorem To prove 
Thoremj^ we compute upper bound for the the supremum risk of the estimator (^4 (defined in (|21[)) 
over the the parameter space P(/3, e, b) introduced in Q. We will see that it matches the lower bound 
derived in Section B.l when 6 > 0. We then prove the upper bound in Theoremto show that the 
proposed tests asymptotically separate the null hypothesis from the alternative hypothesis over the 
detectable regions. In the rest, we denote by <I’(z) = P{Z < z), and ^( 2 ;) = 1 — ‘h(z) the density, 
the cumulative distribution function, and the survival function of a standard normal random variable 
Z, respectively. 

It is trivial to see that for Qq = 0, 


^ / 1 ^ 

sup - Q(/i, e)f = sup - V 

Men{p,e,b) (M,0)en(/3,e,b) \n 

= gVn-2 = n2^+®'-2. 


(39) 


B.2.1 Proof of Theorem [ 3 ] 

The proof of Theorem involves a careful analysis of the bias and variance of the proposed estimator 
Q 4 . We will use the following two lemmas. 

Lemma 3. Let X ^ cr^), Y ^ N{6, cj^). Set rj = £'(o,o)[(-^^ “ — a^)l(X^ V Y'^ > u^r)], 

where the expectation is taken under p = 0 = 0. Then 

rj = —4(T"^r(/)^(r^/^), 

and for r > 1, 

\E[{X^ - a^){Y^ - a^)l{X^ V > ^V)] - r/ - p^9‘^\ 

< min{//^, 3cr^r} min{0^, 3cJ^r} + 2(T^T^'^^(/)(r^/^) min{/r^, Scr^r} 

+ min{0^, 3(T^r}. 

Lemma 4. Let X ^ N{p,a'^),Y ^ N{6,a‘^). Then for t>1, 

Var[(x2 - u2)(y2 _ V Y^ > uV)] 

r 2dV2$(U/2)i/2 

< < Aa'^p^O'^ + Aa'^pfO^ + + 2cr^/r^ + 2cj^0^ 

+^(J^p? + 8cr®0^ + 4cJ® + otherwise, 

where d = £'( 0 , 0 )[(^^ “ (T^)^(y^ — u^)^]. 
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For clarity, we relegate the proofs of Lemma and Lemma to the end of Section B.2 


i=l 


Proof of Theorem^ We first compute the bias of the estimator Q 4 defined in (21). It follows from 
Lemma that for all (/r, 9) G e, b) and r > 1, we have 

|-E’(At,0)(Q4) - 

^ ^ E V ^ 

. I 
2=1 

1 ^ 

< — ^ j^min{;U?, 3(T^r} mm{9f, Scr^r} + min{/r^, Scr^r} 

+ 2 cr^r^/^(/)(r^/^) min{ 0 f, SfJ^r} 
min{( 7 s^, Sa'^qs'^r, da'^qr'^} + min{/cs^, Scr^fcr} , 


1 

< - 
n L 


the last inequality follows from the fact that for (^, 0 ) G e, b), there are at most k nonzero entries 
for either /r or 6, and there are at most q entries that are simultaneously nonzero for both /r and 6. 

On the other hand, by Lemma]^ for all (/r, 9) G fI(/3, e, b) and r > 1, the variance of the estimator 
Q 4 satisfies 

Var(^_ 0 )(Q 4 ) 

n 

V > a^r)] 




2 = 1 


< ^ 


^ 2dl/2$(^l/2)l/2 

i:tJ,i=9i=0 

+ E 6**^ + 2cr^/if + 2a'^9f 


or 6i^0 


+ + 8a^9‘f + 4cj® + 8a^iJL^9‘fT^) 


< ^ 

C 


+ Sa'^qs^ + IGa^qs^ + 4f7^A:s^ + 16a^ks^ + 8a^k + 8a^qs^T^ 


< max{n$(T^/^)^/^, qs'^, qs^, k, ks^, ks'^, 

Again, the second to the last inequality follows from the fact that for {fj.,9) G kl{P,e,b), there are 
at most k nonzero entries for either or 0 , and there are at most q entries that are simultaneously 
nonzero for both /r and 9. 

Combining the bias and variance term, we have 

sup Q{^,9)f 

{/J,,e)&n(l3,e,b) 

^ ^mm{q‘^s^, q^s^r"^, q^T^] + min{A:^s"^, 


< ^ 


+ max{n$(T^/^)^/^, qs‘^, qs^, k, ks^, /cs*^, gs^r^} 


C 




mm 


in{n 2 ^+s', 
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Let r = 41ogn, then we have < C(j){T^^‘^) = 0{n for some constant C. It follows that for 

6 > 0 , 

sup -E(^ 0 )(Q 4 - < 3 (/^, 6 *))^ < C'max|n^^“^(logn)^,n'^+®^“^,n^+'^^“^| 

{fj.,d)en{i3,e,b) ’ 

< CjniP,e,b), 


where for | < e < ^, 


2(iogj^)4 if 0 < 5 < ^^ 4 ^, 

7 „(/ 3 , e, 6 ) = <( n/^+4'>-2 if 2 ^^ < 5 < 


n 


e+6&-2 


if 6 > 


and for ^ < e < /3, 


7n(/3,e,6) = <^ +66-2 


W "(logn)^ if 0 < 6 < |, 


n 


ifb>l. 


Combining (39), (40) and (|g matches the lower bounds in Th eorem g} 
One can also check that when f < e < /3 and s = a\/d\o^, 


sup -E;(^,e)(Q 4 - Q{f^,0)f < ^(logn)'^. 

{lJ.,0).\\lJ.\\o<kn,\\d\\o<k„, 

x3<o-\/'i log’^.11^*1 loo <o-Vrf log n 


(40) 


(41) 


□ 


B.2.2 Proof of Upper Bound in Theorem 


We establish an upper bound for the hypothesis testing problem (31) by constructing tests that 
asymptotically separate the null hypothesis from the alternative hypothesis over the detectable regions. 
This, accompanied by the lower bound derived in Section |B.1.3| that characterizes the undetectable 
regions, completes the proof of Theorem [5] We divide the proof into two cases: the sparse regime 
where 0 < e < §, and the dense regime where § < e < , 0 . 

Proof for sparse regime. Suppose that 0 < e < ^ and a A 6 > 0. We will show that in this case, the 
test ijj* = 1{Q2 > A„), where <52 is defined in (15) with r = logn and (or, say, 

Xn = has sum of maximal type I error and maximal type II error goes to 0 as n tends to 

infinity. 

Following similar variance and bias calculation as in the proof of Theorem in Section 6.1 while 
allowing for different signal strengths r = n“ and s = nf for /r and 6, we see that under Hq where 

Qn — 0, 


sup iQ 2 > Xr, 

{ti,d)&Uo{0AX) 


On the other hand, under Pfi, since 


< C 


Var(^_0)(Q2) 

A 2 

nl^+^ayb- 2 _j_ iQg J.J 


< sup 

{li,9)£Uoil3,a,b) 


n 


2e+4aH-4b 


0 . 


-E(^,e)(Q 2 ) > Q{p,0) - ^ min{2o-V,6»2} + (9f min{ 2 (T^r,/x^}] 


2=1 


> log 


n, 
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we have 


sup -P(^,e)(Q2 < An) < 

(/x,0)ef2l(/3,e,a,b) 


sup 


Var(^^e)(Q2) 


{^J.,e)enll0,e,a,b) {E(^^g-^{Q2) — An)^ 
^ 6+4(2 V b+2(3.A6 

< c . ^ ^ 0. 


n2^+4“+4fe(l + o(l)) 


□ 


Proof for dense regime. Suppose that 0<e< a Ab > and aVb > 0. We will show that in this 
case, the test if* = 1(Q4 > An), where (54 is as defined in (21) with r = 41ogn and An = ln^+2«+2b-i 
has sum of maximal type I error and maximal type II error goes to 0 as n tends to infinity. 

We follow similar variance and bias calculation as in the proof of Theoremin Section B.2.1 while 
allowing for different signal strengths r = n“ and s = for /i and 0. Under Hq, by Lemma 


+{M)(^4)| < —/c++(r^/^) < Cn^ ^{lognf/^ = o(An), 
so when n is sufficiently large, 

Var(^e)(Q4) 


sup P{^j.,e)iQ4: > An) < sup 


(/i,0)er2o(/3,a,b) 


{fj.,9)&Qo{l3,a,b) (An P{^,d){Qi))^ 


„/3+4aVfe 

< r— _ 

— j^2e+4a+4fe 


0 . 


Also, under Hi, since 


1 


P{tj.,e){Q^) > 9) — — min{(jrr^s^, 2>a'^qr‘^T, Sa'^qs'^T, da^qr^} 

+ minj/cA, 3+/cr} 

+ 2a‘^T^^‘^(l){T^P) min{A:s^, S+fer} 
= + o(l)), 


we have 


sup 

(/ 2 , 0 )Gni (/3,e,a,6) 


P{f,,e){Qi < >^n) < sup 


Var(^^0)(Q4) 


(M,®)Sf2i(/3,e,a,fe) {E(fj,,d)iQA) An)^ 

e+4aV6+2aA& i ^/3+4aVb 
< _III_ _V n 

— j.j^2e+4a+4b(-]^ _j_ o(l)) 


□ 


B.2.3 Proof of Lemma 

Let B{6) = E{Y‘^ — r+)+ — Oq. We first note that B{—6) = B{9) > 0 for 0 > 0. This follows from 

B'{6) = 2a[if{T^P — 9/a) — + 9/a)] 

- 20[$(A/2 _ 0/^) _ _ 1] 

> 0 
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and B{0) = 0. So we have B{6) = E(Y^ — tct^)_|_ — ^ 0 for all 0 G M. It follows that {E[{Y‘^ — 

Ta^)+-do])^<{E{Y^ — rcj^)+)^ < i?[(y^ — rcr^)^]. To bound the term E^Y"^ — rcr^)^], we consider 
two cases: 9 < a and 9 > a. It follows from the proof of Lemma 1 in |Cai and Low (2005) that when 
9 < a, then 

-2 _ 2^2 , 44r^/^ + 18 


E[{Y^ - Ta%] < 6a^9^ + 


r/2 


On the other hand, when 9 > a, we have 

E[{Y^ - Ta%] < E[Y^] =9^ + 6a^9‘^ + 3a^ < 100^ 

If follows that we have 


(£'[(y"^ — Tn(T^)+ — < max + a 

B.2.4 Proof of Lemma [3] 

The proof of Lemma is built on Lemma and Lemma 
Lemma 5. Let Y ~ N{9,a‘^). Then for r > 1, 
^[(y 2 -cj 2 )i(y 2 <ctV)] 


t4r^/2 18 


, 100 ^). 


= 9^ 


9 


a 


_ I, ^ 1/2 _ - 


9 


a 


+ ^[ r^/2 + :i j [_^2^i/2 ^ ^ ^1/2 _ :i J [_^2^1/2 _ 


a 




. 1/2 


9 


a 


. 2 ^ 1/2 


In particular, when 0 = 0, 

E\{Y‘^ - CT 2 )l(y 2 < cjV)] = 
Proof. Let A = We have 

^Jy2i(y2 < ^2^^j 


r" y 2 1 

l-aX V^cr 


dy 


r\-9/o 


1 


{9 + az)‘^ '' e dz 

V 27r 


J-X-eja 

^X—91 (J t'X—Oja 

= 9^ 4>{z) dz + 2cr9 / z4>{z) dz + ci^ 

J—X—0 j<j J—X—9 j(7 


pX—Ojo 
— X—9ja 


z‘^(j){z) dz. 


Using the fact that 

poo poo poo 

/ (/>(z) dz = 4’(a), / z(l){z) dz = 4>{a), / z^(f{z) dz = a(^(a) + 4>(a), 

J a J a J a 


we have 


^[y 2 i(y 2 < ^ 2 ^^^ 

= 02[|.(_A - 9/a) - l>(A - 0/cj)] + 2a9[(^{-\ - 9/a) - (j){X - 0/cj)] 
+ (T^[(—A — 9/a)(f>i—X — 9/a) + $(—A — 9/a) 

- (A - 9/a)(j){\ - 9/a) - d(A - 9/a)] 

= (02 + cj2)[$(-A - 9/a) - $(A - 9/a)] + (j){X + 9/a)[-a^X + a9] 
+ 4>{X — 0/ct)[— fj^A — a9], 
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the last equality due to 0(—A — Oja) = (/>(A + 0/a). The proof is complete since a‘^E[l{Y‘^ < cr^r)] = 
a^[^-\-e/a)-^\-e/a)]. □ 

Lemma 6. Let Y ^ N{6, a"^) and set Oq = Eo[{Y‘^ — a‘^)HY‘^ < a'^r)], where the expectation is taken 
under 0 = 0. Then for t >1, 

\E[{Y‘^ — (T^)l(y^ < (T^r)] — 0o| < min{0^, Su^r}. 

Proof. Let A = It follows from Lemmaj^that 9q = —2a‘^X4>{X). Set B{6) = E[{Y‘^ — cr^)l(y^ < 
cj^r)] — 00) then 

E[{Y‘^ - a^)l{Y^ < aV)] < E[Y^1{Y‘^ < aV)] < a^A^, 

and 


E[{Y^ - Cj2)l(y2 < = E{Y^ - a^) - E[{Y^ - a^)l{Y^ > uV)] 

> 02 - E(Y^) = -a^ > -a^X^, 


hence 

\B{e)\ < \E[{Y^ - a2)l(y2 < + |0 q| < ^ 2^2 ^ 2a^X(l){X) < 3a^X‘^ = Sa^r. 

Straightforward calculation yields for 0 > 0 


B'iO) = a{l + X^)[(j){X + 6/a) - 0(A - O/a)] 

+ 20[$(-A-0/a)-$(A-0/cj)], (42) 

B”{e) = (j){X + 0/a)[-A2(A + 9/a) - A + 0/u] 

+ (fix - 9/a)[-X‘^{X - 9/a) - A - O/a] 

+ 2[^i-X - 9/a) -^X-9/a)]. (43) 


It suffices to only consider 0 > 0 since 5(0) 
B'{9) < 29. Since 5(0) = 0, it follows that 


5(—0). It follows from (42) that for all 0 > 0, 


5(0) < 02. 


On the other hand, 0o < 0 immediately gives 5(0) > —a'^ > —02 for 9 > a. For 0 < 9 < a, we 
have cr(I + A2) > 20. For x > 0, we have l>(x) < x~^(l>{x), so 4'(—A — 9/a) = 1 — 4>(A + 9/a) > 
1 — (A + 0/(t)“^(^(A + 9/a). It then follows from ( |42| ) that for 0 < 0 < a, 


5'(0) > 29[(fiX + 9/a) - cfiX - 9/a) + l>(-A - 9/a) - l>(A - 9/a)] 

> 29[1 + (1 - (A + 9/a)-^)(j)iX + 9/a) - (/{X - 9/a) - $(A - O/a)] 

>29 1 + (1 - (A + 0/cr) ^)(l)iX + 9/a) --j=-- 

> 0 . 


Coupled with 5(0) = 0, this implies that 5(0) > 0 > —02 for 0 < 0 < a. Hence, 


5(0) > -02. 


Therefore, for all 0, we have 

|5(0)| < 02. 

□ 
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Proof of Lemma\^ Let 9q = Eq[{Y‘^ — (T^)l(y^ < cr^r)] = —the second equality due 
to Lemma It follows from the expression 

E[{X^ - a^){Y^ - a^)l{X^ V > crV)] 

= /i202 _ E[{X^ - cj2)l(x2 < a^T)]E[{Y^ - a^)liY^ < aV)] 


rj = i?(o,o)[(^' - V 

= -Eq[{X‘^ - cj2)l(x2 < a‘^T)]EQ[{Y‘^ - a^)l{Y^ < crV)] 

= -^0 

that we have 

\E[{X^ - a^){Y^ - a^)l(X^ V Y^ > aV)] - r/ - (44) 

= \E[{X^ - a^)l(X^ < cjV)]S[(y2 _ < ^V)] - 0g|. 

Using the decomposition 

AB — ab = {A — a){B — h) + a{B — b) + b{A — a), 


and triangle inequality, we get 

\E[{X^ - a^)l{X^ < a^T)]E[{Y^ - a‘^)l{Y^ < a^r)] - 

< \E[{X^ - a^)l(X^ < cjV)] - eo\\E[{Y^ - a‘^)l{Y‘^ < crV)] - 9o\ 

+ \9o\\E[iX^ - a2)l(x2 < a\)] - 0oi + ME[{Y^ - cT2)l(y2 < ^ 2 ^)] _ 

< min{^^, ScJ^r} min{0^, Sa^r} + min{/r^, Sa^r} 

+ 2fT^r^/^(/)(r^/^) min{0^, 3cr^r}, 


the last inequality follows from Lemma and substitution of value of 9q. 


□ 


B.2.5 Proofs of Lemma [ 4 ] 

We have 

Var[(x2 - a^){Y^ - a^)l(X‘^ V Y^ > uV)] 

= £’[(X2 - a2)2(y2 _ V > cr'^r)] 

- {E[{X^ - a^)(Y^ - a^)l{X^ V > aV)]}^ 

= £’[(X2 - a2)2(y2 _ ^2^2] _ _ ^2^2 |('j^ 2 < o-2^)(y2 _ ^2^2|(y2 < ^2^^j 

- {E[{X^ - a^)(Y^ - a^)] - E[{X‘^ - a^)l(X^ < a^T){Y‘^ - a^)l(Y^ < a^r)]}^ 

= Var[(X2 _ a2)(y2 _ ^2^j _ ^j^^2 _ ^2^2j^y2 < ^2^)j^j^y2 _ ^2^2j^y2 < ^2^^j 

- {E[{X^ - a^)l(X‘^ < a\)]E[{Y^ - (7^)1{Y^ < uV)]}^ 

+ 2fi^9^E[(X^ - a^)l(X^ < a^T)]E[{Y^ - a^)l{Y^ < uV)] 

< Var[(x2 _£j2)(y2 _^2)] 

+ 2i?9‘^E[{X‘^ - a‘^)l{X‘^ < cjV)]^[(y2 _ a‘^)\{Y‘^ < uV)] 

< Var[(x2 _ cj2)(y2 - cT^)] + 
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Straightforward calculation yields 


Var[(x2-u2)(y2_^2^] 

= Var(x2 - a2)Var(y2 _ + [e{X^ - cj2)]2Var(y2 _ 

+ Var(y2 _ a^)[E{Y^ - 

= [4(T^//^ + + 2a^\ + 6'‘^[4c7^/i^ + 2a^\ 

= + 2 cj^0 ^ + 8 cjV^ + + 4(t®. 

When /i = 0 = 0, let d = -E(o,o)[(^^ “ iT^)^(y^ — cr^)^] < oo, and we have 

Var[(X2 - a2)(y2 _ ^2) v y2 > ^2^^] 

< ^[(X^ - f72)2(y2 _ ( 72 ) 21 (X 2 V y2 > cjV)] 

< (^E[{X^ - u2)4(y2 _ fj2)4]p(^2 V y2 > 

= (^1 _ P(|yI < ri/2)2^ ^here Z ~ X(0,1) 

< (2d)^/2(^l - P{\Z\ < 

= 2di/2|.(r'/2)i/2^ 

the second inequality follows from the Cauchy-Schwarz inequality. 


41 



